GLIP (https://github.com/microsoft/GLIP), which i feel has flown under the radar, is capable of zero-shot object detection. i threw together a notebook that pairs it with the recently released Segment Anything Model (https://github.com/facebookresearch/segment-anything) to do zero-shot instance segmentation: https://colab.research.google.com/drive/1kfdizAJiD5_t-M6yFBB6t2vzGrYg8SJc
submitted by /u/esmooth
[link] [comments]
( 43
min )
For years, while still useful, YouTube transcripts have been pretty terrible. No punctuation, poor translations of heavy accents, and generally difficult to comprehend.
Lo and behold today, I watch a video today from community favourite Károly Zsolnai-Fehér. You know, the Two Minute Papers guy, and..... hold onto your papers, the transcript is almost flawless. Fully punctuated and pretty flawless, even with his heavy English accent.
But I can't see any press about this? When did they transition to a new speech to text model? What model is it using? Anyone have any insight? Here is the video in question if anyone else is interested. https://www.youtube.com/watch?v=1KQc6zHOmtU
submitted by /u/Wooraah
[link] [comments]
( 45
min )
A small project I did a while ago.
Based on a prompt, I ask gpt4 to imagine the project name, architecture and the tools it will use.
I then ask it to implement each file in the project.
Most of the time the project wont run but it's a nice starting point.
Here is the github page: https://github.com/MrNothing/AI-Genie
Note: if your asked for a complex project, if can take a lot of API queries, you have been warned!
Thank you!
submitted by /u/smilefr
[link] [comments]
( 43
min )
submitted by /u/gwern
[link] [comments]
( 42
min )
submitted by /u/UnicornyOnTheCob
[link] [comments]
( 42
min )
submitted by /u/DavstrOne
[link] [comments]
( 44
min )
submitted by /u/TheFootCrew_TFC
[link] [comments]
( 42
min )
submitted by /u/rowancheung
[link] [comments]
( 44
min )
Data is at the heart of machine learning (ML). Including relevant data to comprehensively represent your business problem ensures that you effectively capture trends and relationships so that you can derive the insights needed to drive business decisions. With Amazon SageMaker Canvas, you can now import data from over 40 data sources to be used […]
( 8
min )
This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL) Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, […]
( 14
min )
This GFN Thursday explores the many ways GeForce NOW members can play their favorite PC games across the devices they know and love. Plus, seven new games join the GeForce NOW library this week. More Ways to Play GeForce NOW is the ultimate platform for gamers who want to play across more devices than their Read article >
( 5
min )
Heatmaps are widely used to interpret deep neural networks, particularly for
computer vision tasks, and the heatmap-based explainable AI (XAI) techniques
are a well-researched topic. However, most studies concentrate on enhancing the
quality of the generated heatmap or discovering alternate heatmap generation
techniques, and little effort has been devoted to making heatmap-based XAI
automatic, interactive, scalable, and accessible. To address this gap, we
propose a framework that includes two modules: (1) context modelling and (2)
reasoning. We proposed a template-based image captioning approach for context
modelling to create text-based contextual information from the heatmap and
input data. The reasoning module leverages a large language model to provide
explanations in combination with specialised knowledge. Our qualitative
experiments demonstrate the effectiveness of our framework and heatmap
captioning approach. The code for the proposed template-based heatmap
captioning approach will be publicly available.
( 2
min )
Stock market forecasting has been a challenging part for many analysts and
researchers. Trend analysis, statistical techniques, and movement indicators
have traditionally been used to predict stock price movements, but text
extraction has emerged as a promising method in recent years. The use of neural
networks, especially recurrent neural networks, is abundant in the literature.
In most studies, the impact of different users was considered equal or ignored,
whereas users can have other effects. In the current study, we will introduce
TM-vector and then use this vector to train an IndRNN and ultimately model the
market users' behaviour. In the proposed model, TM-vector is simultaneously
trained with both the extracted Twitter features and market information.
Various factors have been used for the effectiveness of the proposed
forecasting approach, including the characteristics of each individual user,
their impact on each other, and their impact on the market, to predict market
direction more accurately. Dow Jones 30 index has been used in current work.
The accuracy obtained for predicting daily stock changes of Apple is based on
various models, closed to over 95\% and for the other stocks is significant.
Our results indicate the effectiveness of TM-vector in predicting stock market
direction.
( 3
min )
Recently, fully-transformer architectures have replaced the defacto
convolutional architecture for the 3D human pose estimation task. In this paper
we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that
leverages a new \textbf{\textit{dynamic multi-headed convolutional
self-attention}} mechanism for monocular 3D human pose estimation. We designed
a spatial and temporal convolutional transformer to comprehensively model human
joint relations within individual frames and globally across the motion
sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal
joints profile}} for our temporal ConvFormer that fuses complete temporal
information immediately for a local neighborhood of joint features. We have
quantitatively and qualitatively validated our method on three common benchmark
datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have
been conducted to identify the optimal hyper-parameter set. These experiments
demonstrated that we achieved a \textbf{significant parameter reduction
relative to prior transformer models} while attaining State-of-the-Art (SOTA)
or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol
III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on
all three metrics for the MPI-INF-3DHP dataset and for all three subjects on
HumanEva under Protocol II.
( 2
min )
Machine learning (ML) has become critical for post-acquisition data analysis
in (scanning) transmission electron microscopy, (S)TEM, imaging and
spectroscopy. An emerging trend is the transition to real-time analysis and
closed-loop microscope operation. The effective use of ML in electron
microscopy now requires the development of strategies for microscopy-centered
experiment workflow design and optimization. Here, we discuss the associated
challenges with the transition to active ML, including sequential data analysis
and out-of-distribution drift effects, the requirements for the edge operation,
local and cloud data storage, and theory in the loop operations. Specifically,
we discuss the relative contributions of human scientists and ML agents in the
ideation, orchestration, and execution of experimental workflows and the need
to develop universal hyper languages that can apply across multiple platforms.
These considerations will collectively inform the operationalization of ML in
next-generation experimentation.
( 2
min )
Recently developed text-to-image diffusion models make it easy to edit or
create high-quality images. Their ease of use has raised concerns about the
potential for malicious editing or deepfake creation. Imperceptible
perturbations have been proposed as a means of protecting images from malicious
editing by preventing diffusion models from generating realistic images.
However, we find that the aforementioned perturbations are not robust to JPEG
compression, which poses a major weakness because of the common usage and
availability of JPEG. We discuss the importance of robustness for additive
imperceptible perturbations and encourage alternative approaches to protect
images against editing.
( 2
min )
We use information-theoretic tools to derive a novel analysis of Multi-source
Domain Adaptation (MDA) from the representation learning perspective.
Concretely, we study joint distribution alignment for supervised MDA with few
target labels and unsupervised MDA with pseudo labels, where the latter is
relatively hard and less commonly studied. We further provide
algorithm-dependent generalization bounds for these two settings, where the
generalization is characterized by the mutual information between the
parameters and the data. Then we propose a novel deep MDA algorithm, implicitly
addressing the target shift through joint alignment. Finally, the mutual
information bounds are extended to this algorithm providing a non-vacuous
gradient-norm estimation. The proposed algorithm has comparable performance to
the state-of-the-art on target-shifted MDA benchmark with improved memory
efficiency.
( 2
min )
We perform an effective-theory analysis of forward-backward signal
propagation in wide and deep Transformers, i.e., residual neural networks with
multi-head self-attention blocks and multilayer perceptron blocks. This
analysis suggests particular width scalings of initialization and training
hyperparameters for these models. We then take up such suggestions, training
Vision and Language Transformers in practical setups.
( 2
min )
We consider the problem of learning multioutput function classes in batch and
online settings. In both settings, we show that a multioutput function class is
learnable if and only if each single-output restriction of the function class
is learnable. This provides a complete characterization of the learnability of
multilabel classification and multioutput regression in both batch and online
settings. As an extension, we also consider multilabel learnability in the
bandit feedback setting and show a similar characterization as in the
full-feedback setting.
( 2
min )
We use information-theoretic tools to derive a novel analysis of Multi-source
Domain Adaptation (MDA) from the representation learning perspective.
Concretely, we study joint distribution alignment for supervised MDA with few
target labels and unsupervised MDA with pseudo labels, where the latter is
relatively hard and less commonly studied. We further provide
algorithm-dependent generalization bounds for these two settings, where the
generalization is characterized by the mutual information between the
parameters and the data. Then we propose a novel deep MDA algorithm, implicitly
addressing the target shift through joint alignment. Finally, the mutual
information bounds are extended to this algorithm providing a non-vacuous
gradient-norm estimation. The proposed algorithm has comparable performance to
the state-of-the-art on target-shifted MDA benchmark with improved memory
efficiency.
( 2
min )
In this paper we consider a new class of RBF (Radial Basis Function) neural
networks, in which smoothing factors are replaced with shifts. We prove under
certain conditions on the activation function that these networks are capable
of approximating any continuous multivariate function on any compact subset of
the $d$-dimensional Euclidean space. For RBF networks with finitely many fixed
centroids we describe conditions guaranteeing approximation with arbitrary
precision.
( 2
min )
This paper focuses on optimal unimodal transformation of the score outputs of
a univariate learning model under linear loss functions. We demonstrate that
the optimal mapping between score values and the target region is a rectangular
function. To produce this optimal rectangular fit for the observed samples, we
propose a sequential approach that can its estimation with each incoming new
sample. Our approach has logarithmic time complexity per iteration and is
optimally efficient.
( 2
min )
This paper presents a new convergent Plug-and-Play (PnP) algorithm. PnP
methods are efficient iterative algorithms for solving image inverse problems
formulated as the minimization of the sum of a data-fidelity term and a
regularization term. PnP methods perform regularization by plugging a
pre-trained denoiser in a proximal algorithm, such as Proximal Gradient Descent
(PGD). To ensure convergence of PnP schemes, many works study specific
parametrizations of deep denoisers. However, existing results require either
unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive
conditions on the parameters of the inverse problem. Observing that these
limitations can be due to the proximal algorithm in use, we study a relaxed
version of the PGD algorithm for minimizing the sum of a convex function and a
weakly convex one. When plugged with a relaxed proximal denoiser, we show that
the proposed PnP-$\alpha$PGD algorithm converges for a wider range of
regularization parameters, thus allowing more accurate image restoration.
( 2
min )
We perform an effective-theory analysis of forward-backward signal
propagation in wide and deep Transformers, i.e., residual neural networks with
multi-head self-attention blocks and multilayer perceptron blocks. This
analysis suggests particular width scalings of initialization and training
hyperparameters for these models. We then take up such suggestions, training
Vision and Language Transformers in practical setups.
( 2
min )
We establish disintegrated PAC-Bayesian generalisation bounds for models
trained with gradient descent methods or continuous gradient flows. Contrary to
standard practice in the PAC-Bayesian setting, our result applies to
optimisation algorithms that are deterministic, without requiring any
de-randomisation step. Our bounds are fully computable, depending on the
density of the initial distribution and the Hessian of the training objective
over the trajectory. We show that our framework can be applied to a variety of
iterative optimisation algorithms, including stochastic gradient descent (SGD),
momentum-based schemes, and damped Hamiltonian dynamics.
( 2
min )
Just stating what should be obvious.
https://raygun.com/blog/costly-software-errors-history/
https://en.wikipedia.org/wiki/List_of_software_bugs
I have no doubt that in the next few decades, A.I. will top the list of most expensive software bugs ever.
When A.I. can do superhuman things, like a calculator doing arithmetic, then it will have the power to do major oops.
submitted by /u/Terminator857
[link] [comments]
( 42
min )
submitted by /u/zen_tm
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
submitted by /u/abhinav_sk
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 43
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/
https://github.com/facebookresearch/segment-anything
Today, we aim to democratize segmentation by introducing the Segment Anything project: a new task, dataset, and model for image segmentation, as we explain in our research paper. We are releasing both our general Segment Anything Model (SAM) and our Segment Anything 1-Billion mask dataset (SA-1B), the largest ever segmentation dataset, to enable a broad set of applications and foster further research into foundation models for computer vision. We are making the SA-1B dataset available for research purposes and the Segment Anything Model is available under a permissive open license (Apache 2.0).
submitted by /u/Sirisian
[link] [comments]
( 44
min )
You can use Random Projections for dimensional reduction. Allowing small neural networks to process big data. They can be fast too.
https://ai462qqq.blogspot.com/2023/04/random-projections-for-neural-networks.html
submitted by /u/SeanHaddPS
[link] [comments]
( 44
min )
The rise of text and semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by unified text and image can provide extra flexibility in search solutions. You can use both text and images as queries. For example, you have a folder of hundreds of family pictures in […]
( 14
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). We are excited to announce the launch of Amazon Kendra Featured Results. This new feature makes specific documents or content appear at the top of the search results page whenever a user issues a certain query. You can use Featured Results to improve […]
( 6
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Many publishers have a large library of stock images that they use for their articles. These images can be reused many times for different stories, especially when the […]
( 8
min )
MLPerf remains the definitive measurement for AI performance as an independent, third-party benchmark. NVIDIA’s AI platform has consistently shown leadership across both training and inference since the inception of MLPerf, including the MLPerf Inference 3.0 benchmarks released today. “Three years ago when we introduced A100, the AI world was dominated by computer vision. Generative AI Read article >
( 7
min )
Seems like it could be useful to some others here
https://www.edgeimpulse.com/blog/unveiling-the-new-edge-impulse-python-sdk
submitted by /u/gtj
[link] [comments]
( 43
min )
https://reddit.com/link/12bohof/video/i5x73plm9wra1/player
Hi guys!
We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR 2023).
- Paper: https://arxiv.org/abs/2211.16374
- Project: https://gwang-kim.github.io/datid_3d/
- Code & Gradio Demo: https://github.com/gwang-kim/DATID-3D
- Colab Demo: https://colab.research.google.com/drive/1e9NSVB7x_hjz-nr4K0jO4rfTXILnNGtA?usp=sharing
DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.
We showcase the demo of text-guided manipulated 3D reconstruction beyond text-guided image manipulation!
https://i.redd.it/qadhxvpaawra1.gif
submitted by /u/ImBradleyKim
[link] [comments]
( 45
min )
submitted by /u/realZenLime
[link] [comments]
( 41
min )
submitted by /u/rowancheung
[link] [comments]
( 42
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra FAQs allow users to upload […]
( 8
min )
Time series are sequences of data points that occur in successive order over some period of time. We often analyze these data points to make better business decisions or gain competitive advantages. An example is Shimamura Music, who used Amazon Forecast to improve shortage rates and increase business efficiency. Another great example is Arneg, who […]
( 7
min )
NVIDIA today recognized a dozen partners in the Americas for their work enabling customers to build and deploy AI applications across a broad range of industries. NVIDIA Partner Network (NPN) Americas Partner of the Year awards were given out to companies in 13 categories covering AI, consulting, distribution, education, healthcare, integration, networking, the public sector, Read article >
( 6
min )
Video editor Patrick Stirling used the Magic Mask feature in Blackmagic Design’s DaVinci Resolve software to create a custom effect that creates textured animations of people, this week In the NVIDIA Studio.
( 6
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 41
min )
submitted by /u/sweetloup
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/st_nebula
[link] [comments]
( 41
min )
In his book The Book of Why, Judea Pearl advocates for teaching cause and effect principles to machines in order to enhance their intelligence. The accomplishments of deep learning are essentially just a type of curve fitting, whereas causality could be used to uncover interactions between the systems of the world under various constraints without […]
( 11
min )
The size and complexity of large language models (LLMs) have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to […]
( 15
min )
Sometimes you start a blog with a hypothesis in mind, and then that intention changes as you research and realize that your original idea was wrong. Yep, this is one of those blogs. Learning can be fun if you let go of pre-existing dogma and learn along your life journey. I’ve always been curious (a… Read More »Creating Healthy AI Utility Function: Importance of Diversity – Part I
The post Creating Healthy AI Utility Function: Importance of Diversity – Part I appeared first on Data Science Central.
( 22
min )
“DribbleBot” can maneuver a soccer ball on landscapes such as sand, gravel, mud, and snow, using reinforcement learning to adapt to varying ball dynamics.
( 10
min )
The evaluation of explanation methods is a research topic that has not yet
been explored deeply, however, since explainability is supposed to strengthen
trust in artificial intelligence, it is necessary to systematically review and
compare explanation methods in order to confirm their correctness. Until now,
no tool with focus on XAI evaluation exists that exhaustively and speedily
allows researchers to evaluate the performance of explanations of neural
network predictions. To increase transparency and reproducibility in the field,
we therefore built Quantus -- a comprehensive, evaluation toolkit in Python
that includes a growing, well-organised collection of evaluation metrics and
tutorials for evaluating explainable methods. The toolkit has been thoroughly
tested and is available under an open-source license on PyPi (or on
https://github.com/understandable-machine-intelligence-lab/Quantus/).
( 2
min )
Domain adaptation of GANs is a problem of fine-tuning the state-of-the-art
GAN models (e.g. StyleGAN) pretrained on a large dataset to a specific domain
with few samples (e.g. painting faces, sketches, etc.). While there are a great
number of methods that tackle this problem in different ways, there are still
many important questions that remain unanswered.
In this paper, we provide a systematic and in-depth analysis of the domain
adaptation problem of GANs, focusing on the StyleGAN model. First, we perform a
detailed exploration of the most important parts of StyleGAN that are
responsible for adapting the generator to a new domain depending on the
similarity between the source and target domains. As a result of this in-depth
study, we propose new efficient and lightweight parameterizations of StyleGAN
for domain adaptation. Particularly, we show there exist directions in
StyleSpace (StyleDomain directions) that are sufficient for adapting to similar
domains and they can be reduced further. For dissimilar domains, we propose
Affine$+$ and AffineLight$+$ parameterizations that allows us to outperform
existing baselines in few-shot adaptation with low data regime. Finally, we
examine StyleDomain directions and discover their many surprising properties
that we apply for domain mixing and cross-domain image morphing.
( 3
min )
Data privacy and ownership are significant in social data science, raising
legal and ethical concerns. Sharing and analyzing data is difficult when
different parties own different parts of it. An approach to this challenge is
to apply de-identification or anonymization techniques to the data before
collecting it for analysis. However, this can reduce data utility and increase
the risk of re-identification. To address these limitations, we present PADME,
a distributed analytics tool that federates model implementation and training.
PADME uses a federated approach where the model is implemented and deployed by
all parties and visits each data location incrementally for training. This
enables the analysis of data across locations while still allowing the model to
be trained as if all data were in a single location. Training the model on data
in its original location preserves data ownership. Furthermore, the results are
not provided until the analysis is completed on all data locations to ensure
privacy and avoid bias in the results.
( 3
min )
In smart electrical grids, fault detection tasks may have a high impact on
society due to their economic and critical implications. In the recent years,
numerous smart grid applications, such as defect detection and load
forecasting, have embraced data-driven methodologies. The purpose of this study
is to investigate the challenges associated with the security of machine
learning (ML) applications in the smart grid scenario. Indeed, the robustness
and security of these data-driven algorithms have not been extensively studied
in relation to all power grid applications. We demonstrate first that the deep
neural network method used in the smart grid is susceptible to adversarial
perturbation. Then, we highlight how studies on fault localization and type
classification illustrate the weaknesses of present ML algorithms in smart
grids to various adversarial attacks
( 2
min )
In this paper, we propose a methodology to align a medium-sized GPT model,
originally trained in English for an open domain, to a small closed domain in
Spanish. The application for which the model is finely tuned is the question
answering task. To achieve this we also needed to train and implement another
neural network (which we called the reward model) that could score and
determine whether an answer is appropriate for a given question. This component
served to improve the decoding and generation of the answers of the system.
Numerical metrics such as BLEU and perplexity were used to evaluate the model,
and human judgment was also used to compare the decoding technique with others.
Finally, the results favored the proposed method, and it was determined that it
is feasible to use a reward model to align the generation of responses.
( 2
min )
We study the convex hulls of reachable sets of nonlinear systems with bounded
disturbances. Reachable sets play a critical role in control, but remain
notoriously challenging to compute, and existing over-approximation tools tend
to be conservative or computationally expensive. In this work, we exactly
characterize the convex hulls of reachable sets as the convex hulls of
solutions of an ordinary differential equation from all possible initial values
of the disturbances. This finite-dimensional characterization unlocks a tight
estimation algorithm to over-approximate reachable sets that is significantly
faster and more accurate than existing methods. We present applications to
neural feedback loop analysis and robust model predictive control.
( 2
min )
We consider the problem of online multiclass learning when the number of
labels is unbounded. We show that the Multiclass Littlestone dimension, first
introduced in \cite{DanielyERMprinciple}, continues to characterize online
learnability in this setting. Our result complements the recent work by
\cite{Brukhimetal2022} who give a characterization of batch multiclass
learnability when the label space is unbounded.
( 2
min )
People with diabetes have to manage their blood glucose level to keep it
within an appropriate range. Predicting whether future glucose values will be
outside the healthy threshold is of vital importance in order to take
corrective actions to avoid potential health damage. In this paper we describe
our research with the aim of predicting the future behavior of blood glucose
levels, so that hypoglycemic events may be anticipated. The approach of this
work is the application of transformation functions on glucose time series, and
their use in convolutional neural networks. We have tested our proposed method
using real data from 4 different diabetes patients with promising results.
( 2
min )
Many organizations measure treatment effects via an experimentation platform
to evaluate the casual effect of product variations prior to full-scale
deployment. However, standard experimentation platforms do not perform
optimally for end user populations that exhibit heterogeneous treatment effects
(HTEs). Here we present a personalized experimentation framework, Personalized
Experiments (PEX), which optimizes treatment group assignment at the user level
via HTE modeling and sequential decision policy optimization to optimize
multiple short-term and long-term outcomes simultaneously. We describe an
end-to-end workflow that has proven to be successful in practice and can be
readily implemented using open-source software.
( 2
min )
Metadata quality is crucial for digital objects to be discovered through
digital library interfaces. However, due to various reasons, the metadata of
digital objects often exhibits incomplete, inconsistent, and incorrect values.
We investigate methods to automatically detect, correct, and canonicalize
scholarly metadata, using seven key fields of electronic theses and
dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that
utilizes state-of-the-art artificial intelligence methods to improve the
quality of these fields. To evaluate MetaEnhance, we compiled a metadata
quality evaluation benchmark containing 500 ETDs, by combining subsets sampled
using multiple criteria. We tested MetaEnhance on this benchmark and found that
the proposed methods achieved nearly perfect F1-scores in detecting errors and
F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven
fields.
( 2
min )
In this paper, we introduce the range of oBERTa language models, an
easy-to-use set of language models, which allows Natural Language Processing
(NLP) practitioners to obtain between 3.8 and 24.3 times faster models without
expertise in model compression. Specifically, oBERTa extends existing work on
pruning, knowledge distillation, and quantization and leverages frozen
embeddings to improve knowledge distillation, and improved model initialization
to deliver higher accuracy on a a broad range of transfer tasks. In generating
oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with
respect to pruning during pre-training and fine-tuning and find it less
amenable to compression during fine-tuning. We explore the use of oBERTa on a
broad seven representative NLP tasks and find that the improved compression
techniques allow a pruned oBERTa model to match the performance of BERTBASE and
exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering
dataset, despite being 8x and 2x, respectively, faster in inference. We release
our code, training regimes, and associated model for broad usage to encourage
usage and experimentation.
( 2
min )
In this paper, we revisit the problem of Differentially Private Stochastic
Convex Optimization (DP-SCO) in Euclidean and general $\ell_p^d$ spaces.
Specifically, we focus on three settings that are still far from well
understood: (1) DP-SCO over a constrained and bounded (convex) set in Euclidean
space; (2) unconstrained DP-SCO in $\ell_p^d$ space; (3) DP-SCO with
heavy-tailed data over a constrained and bounded set in $\ell_p^d$ space. For
problem (1), for both convex and strongly convex loss functions, we propose
methods whose outputs could achieve (expected) excess population risks that are
only dependent on the Gaussian width of the constraint set rather than the
dimension of the space. Moreover, we also show the bound for strongly convex
functions is optimal up to a logarithmic factor. For problems (2) and (3), we
propose several novel algorithms and provide the first theoretical results for
both cases when $1<p<2$ and $2\leq p\leq \infty$.
( 2
min )
submitted by /u/Science_is_Greatness
[link] [comments]
( 42
min )
submitted by /u/tyw7
[link] [comments]
( 44
min )
submitted by /u/ea_man
[link] [comments]
( 42
min )
submitted by /u/dragon_6666
[link] [comments]
( 43
min )
Here's a video that presents a very interesting solution to alignment problems: https://youtu.be/fKgPg_j9eF0
Hope you learned something new!
submitted by /u/RamazanBlack
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/SuckMyPenisReddit
[link] [comments]
( 43
min )
Hi all,
Recently I wrote a small blog article regarding predicting football (soccer) match outcomes using Machine Learning and utilizing bookmakers odds. I tested also real betting scenarios using the ML predictions developed. TL;DR: Using ML and Bookmakers odds to predict soccer matches results in better than literature accuracy. However it is not enough to provide consistent profit.
Blog post : https://medium.com/@grstathis/predicting-football-soccer-match-outcomes-using-bookmaker-betting-odds-477c62b2e0e9
I hope it is something interesting, feedback is always welcome :-)
submitted by /u/touristroni
[link] [comments]
( 43
min )
submitted by /u/edrulesok
[link] [comments]
( 43
min )
https://www.youtube.com/watch?v=ZZ0atq2yYJw&list=LL&index=3
submitted by /u/norcalnatv
[link] [comments]
( 43
min )
Paper: https://arxiv.org/abs/2303.17580
Abstract:
Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence (AGI). While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a system that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., HuggingFace) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in HuggingFace, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards AGI.
https://preview.redd.it/huc5so9f1ira1.jpg?width=1201&format=pjpg&auto=webp&s=cd714263f8a6ea443195316d95704fd550beee95
https://preview.redd.it/d2dfhs9f1ira1.jpg?width=655&format=pjpg&auto=webp&s=07fcb2b969cdaaf649aed259296f3dfa9157531e
https://preview.redd.it/v4gc9r9f1ira1.jpg?width=773&format=pjpg&auto=webp&s=b014fa679a7bdc2024a3d27690950be2248735aa
submitted by /u/Singularian2501
[link] [comments]
( 48
min )
Bloomberg released BloombergGPT for finance. This is the first of a kind LLM for finance.
https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/
I also reviewed the article and publication on medium. This should give you a TLDR of VERY LONG article.
https://pub.towardsai.net/bloomberggpt-the-first-gpt-for-finance-72670f99566a
submitted by /u/Ok-Range1608
[link] [comments]
( 47
min )
submitted by /u/saharNooby
[link] [comments]
( 43
min )
From the same lab that developed FlashAttention. They tried their approach with 64k tokens, if I read this correctly, and claim it can be scaled up massively.
Blogpost: https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning
Paper: https://arxiv.org/abs/2302.10866#
submitted by /u/ReasonablyBadass
[link] [comments]
( 46
min )
submitted by /u/Desi___Gigachad
[link] [comments]
( 50
min )
submitted by /u/RamazanBlack
[link] [comments]
( 43
min )
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Keltushadowfang
[link] [comments]
( 42
min )
submitted by /u/Midiall
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/bartturner
[link] [comments]
( 42
min )
submitted by /u/benaugustine
[link] [comments]
( 47
min )
submitted by /u/typcalthowawayacount
[link] [comments]
( 42
min )
submitted by /u/PeckCentral
[link] [comments]
( 43
min )
submitted by /u/TalkinBen2000
[link] [comments]
( 42
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/ThickDoctor007
[link] [comments]
( 43
min )
submitted by /u/sivstarlight
[link] [comments]
( 44
min )
submitted by /u/goldemerald
[link] [comments]
( 44
min )
submitted by /u/alen_smajic
[link] [comments]
( 43
min )
submitted by /u/challengingviews
[link] [comments]
( 43
min )
submitted by /u/sigpwned
[link] [comments]
( 43
min )
submitted by /u/davidbun
[link] [comments]
( 44
min )
submitted by /u/seraschka
[link] [comments]
( 43
min )
There is a new project that Epic Games have announced that will allow developers to train ML agents in Unreal Engine.
Post here:
https://dev.epicgames.com/community/learning/tutorials/8OWY/unreal-engine-learning-agents-introduction
Can't wait to play with it! It has only just been announced so no estimate on when they will release it (in beta/experimental form).
submitted by /u/romantimm25
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/logosfabula
[link] [comments]
( 42
min )
Hey, I was just watching the GTC 2023 Keynote with NVIDIA CEO Jensen Huang (shorter version) and something struck me. It somehow looks weird. Maybe those are just video compression artifacts, but he is very blurry in the lower face area, and you can clearly see it if you pause (not cherry picked screencap). Check his keynote from last year, there is no blur at all. And 2023 video looks wierd in some other ways too, lip sync is kinda off and so on. I know that Nvidia was showing a short Jensen Huang deepfake a two years ago, so does this mean that this year they have decided to generate the whole keynote and nobody has noticed?
submitted by /u/wojtek15
[link] [comments]
( 43
min )
submitted by /u/jaketocake
[link] [comments]
( 42
min )
submitted by /u/EbayMustache
[link] [comments]
( 47
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 42
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 44
min )
https://youtu.be/AaTRHFaaPG8
This guy is one of the key experts and has a video online called We're all going to die!
It would be great if someone could edit this down to the key points.
submitted by /u/zascar
[link] [comments]
( 42
min )
https://github.com/kabouzeid/turm
I wanted to share my latest side project: a simple lazygit-like TUI for the Slurm Workload Manager. I'm still working on adding more functionality, but I wanted to share what I have so far and get feedback from the community.
submitted by /u/kabouzeid
[link] [comments]
( 43
min )
https://www.openpetition.eu/petition/online/securing-our-digital-future-a-cern-for-open-source-large-scale-ai-research-and-its-safety
Join us in our urgent mission to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models. This monumental initiative will secure our technological independence, empower global innovation, and ensure safety, while safeguarding our democratic principles for generations to come.
submitted by /u/stringShuffle
[link] [comments]
( 49
min )
Train a general DNN from scratch to automatically achieve both high performance and slim structure simultaneously.
Publications in ICLR 2023 and NeurIPS 2021.
Github: https://github.com/tianyic/only_train_once
https://preview.redd.it/2rnfd6m2a0ra1.png?width=2150&format=png&auto=webp&s=cea3e84ae72ee784236befffc78e71711da08b64
submitted by /u/No-Egg6431
[link] [comments]
( 44
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
This post was co-written with Tony Momenpour and Drew Clark from KYTC. Government departments and businesses operate contact centers to connect with their communities, enabling citizens and customers to call to make appointments, request services, and sometimes just ask a question. When there are more calls than agents can answer, callers get placed on hold […]
( 7
min )
Intelligent document processing (IDP) with AWS helps automate information extraction from documents of different types and formats, quickly and with high accuracy, without the need for machine learning (ML) skills. Faster information extraction with high accuracy can help you make quality business decisions on time, while reducing overall costs. For more information, refer to Intelligent […]
( 8
min )
Like many managers in the corporate world, until recently I thought you should not use these tools. The common theme is that it’s for small projects or classroom problems. Not for the real world. Then, in the process of designing a new course, I had to work with notebooks. Because all classes use notebooks these… Read More »My First Notebook and Colab Project: Sharing my Thoughts
The post My First Notebook and Colab Project: Sharing my Thoughts appeared first on Data Science Central.
( 21
min )
Machine learning (ML) and Artificial Intelligence (AI) have been receiving a lot of public interest in recent years, with both terms being practically common in the IT language. Despite their similarities, there are some important differences between ML and AI that are frequently neglected. Thus we will cover the key differences between ML and AI… Read More »Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences
The post Machine Learning (ML) vs Artificial Intelligence (AI) — Crucial Differences appeared first on Data Science Central.
( 23
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
MIT researchers built DiffDock, a model that may one day be able to find new drugs faster than traditional methods and reduce the potential for adverse side effects.
( 10
min )
Several research works have applied Reinforcement Learning (RL) algorithms to
solve the Rate Adaptation (RA) problem in Wi-Fi networks. The dynamic nature of
the radio link requires the algorithms to be responsive to changes in link
quality. Delays in the execution of the algorithm may be detrimental to its
performance, which in turn may decrease network performance. This aspect has
been overlooked in the state of the art. In this paper, we present an analysis
of common computational delays in RL-based RA algorithms, and propose a
methodology that may be applied to reduce these computational delays and
increase the efficiency of this type of algorithms. We apply the proposed
methodology to an existing RL-based RA algorithm. The obtained experimental
results indicate a reduction of one order of magnitude in the execution time of
the algorithm, improving its responsiveness to link quality changes.
( 2
min )
Continual learning (CL) aims to learn a sequence of tasks over time, with
data distributions shifting from one task to another. When training on new task
data, data representations from old tasks may drift. Some negative
representation drift can result in catastrophic forgetting, by causing the
locally learned class prototypes and data representations to correlate poorly
across tasks. To mitigate such representation drift, we propose a method that
finds global prototypes to guide the learning, and learns data representations
with the regularization of the self-supervised information. Specifically, for
NLP tasks, we formulate each task in a masked language modeling style, and
learn the task via a neighbor attention mechanism over a pre-trained language
model. Experimental results show that our proposed method can learn fairly
consistent representations with less representation drift, and significantly
reduce catastrophic forgetting in CL without resampling data from past tasks.
( 2
min )
Offline reinforcement learning (RL) allows for the training of competent
agents from offline datasets without any interaction with the environment.
Online finetuning of such offline models can further improve performance. But
how should we ideally finetune agents obtained from offline RL training? While
offline RL algorithms can in principle be used for finetuning, in practice,
their online performance improves slowly. In contrast, we show that it is
possible to use standard online off-policy algorithms for faster improvement.
However, we find this approach may suffer from policy collapse, where the
policy undergoes severe performance deterioration during initial online
learning. We investigate the issue of policy collapse and how it relates to
data diversity, algorithm choices and online replay distribution. Based on
these insights, we propose a conservative policy optimization procedure that
can achieve stable and sample-efficient online learning from offline
pretraining.
( 2
min )
Multi-view clustering (MvC) aims at exploring the category structure among
multi-view data without label supervision. Multiple views provide more
information than single views and thus existing MvC methods can achieve
satisfactory performance. However, their performance might seriously degenerate
when the views are noisy in practical scenarios. In this paper, we first
formally investigate the drawback of noisy views and then propose a
theoretically grounded deep MvC method (namely MvCAN) to address this issue.
Specifically, we propose a novel MvC objective that enables un-shared
parameters and inconsistent clustering predictions across multiple views to
reduce the side effects of noisy views. Furthermore, a non-parametric iterative
process is designed to generate a robust learning target for mining multiple
views' useful information. Theoretical analysis reveals that MvCAN works by
achieving the multi-view consistency, complementarity, and noise robustness.
Finally, experiments on public datasets demonstrate that MvCAN outperforms
state-of-the-art methods and is robust against the existence of noisy views.
( 2
min )
In the field of functional genomics, the analysis of gene expression profiles
through Machine and Deep Learning is increasingly providing meaningful insight
into a number of diseases. The paper proposes a novel algorithm to perform
Feature Selection on genomic-scale data, which exploits the reconstruction
capabilities of autoencoders and an ad-hoc defined Explainable Artificial
Intelligence-based score in order to select the most informative genes for
diagnosis, prognosis, and precision medicine. Results of the application on a
Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the
algorithm, by identifying and suggesting a set of meaningful genes for further
medical investigation.
( 2
min )
Given a graph with a subset of labeled nodes, we are interested in the
quality of the averaging estimator which for an unlabeled node predicts the
average of the observations of its labeled neighbours. We rigorously study
concentration properties, variance bounds and risk bounds in this context.
While the estimator itself is very simple and the data generating process is
too idealistic for practical applications, we believe that our small steps will
contribute towards the theoretical understanding of more sophisticated methods
such as Graph Neural Networks.
( 2
min )
We propose a generic spatiotemporal framework to analyze manifold-valued
measurements, which allows for employing an intrinsic and computationally
efficient Riemannian hierarchical model. Particularly, utilizing regression, we
represent discrete trajectories in a Riemannian manifold by composite B\' ezier
splines, propose a natural metric induced by the Sasaki metric to compare the
trajectories, and estimate average trajectories as group-wise trends. We
evaluate our framework in comparison to state-of-the-art methods within
qualitative and quantitative experiments on hurricane tracks. Notably, our
results demonstrate the superiority of spline-based approaches for an intensity
classification of the tracks.
( 2
min )
We show that symmetrically padded convolution can be analytically inverted
via DFT. We comprehensively analyze several different symmetric and
anti-symmetric padding modes and show that multiple cases exist where the
inversion can be achieved. The implementation is available at
\url{https://github.com/prclibo/iconv_dft}.
( 2
min )
We present DiffCollage, a compositional diffusion model that can generate
large content by leveraging diffusion models trained on generating pieces of
the large content. Our approach is based on a factor graph representation where
each factor node represents a portion of the content and a variable node
represents their overlap. This representation allows us to aggregate
intermediate outputs from diffusion models defined on individual nodes to
generate content of arbitrary size and shape in parallel without resorting to
an autoregressive generation procedure. We apply DiffCollage to various tasks,
including infinite image generation, panorama image generation, and
long-duration text-guided motion generation. Extensive experimental results
with a comparison to strong autoregressive baselines verify the effectiveness
of our approach.
( 2
min )
Conventional optimization methods in machine learning and controls rely
heavily on first-order update rules. Selecting the right method and
hyperparameters for a particular task often involves trial-and-error or
practitioner intuition, motivating the field of meta-learning. We generalize a
broad family of preexisting update rules by proposing a meta-learning framework
in which the inner loop optimization step involves solving a differentiable
convex optimization (DCO). We illustrate the theoretical appeal of this
approach by showing that it enables one-step optimization of a family of linear
least squares problems, given that the meta-learner has sufficient exposure to
similar tasks. Various instantiations of the DCO update rule are compared to
conventional optimizers on a range of illustrative experimental settings.
( 2
min )
In orthogonal world coordinates, a Manhattan world lying along cuboid
buildings is widely useful for various computer vision tasks. However, the
Manhattan world has much room for improvement because the origin of pan angles
from an image is arbitrary, that is, four-fold rotational symmetric ambiguity
of pan angles. To address this problem, we propose a definition for the
pan-angle origin based on the directions of the roads with respect to a camera
and the direction of travel. We propose a learning-based calibration method
that uses heatmap regression to remove the ambiguity by each direction of
labeled image coordinates, similar to pose estimation keypoints.
Simultaneously, our two-branched network recovers the rotation and removes
fisheye distortion from a general scene image. To alleviate the lack of
vanishing points in images, we introduce auxiliary diagonal points that have
the optimal 3D arrangement of spatial uniformity. Extensive experiments
demonstrated that our method outperforms conventional methods on large-scale
datasets and with off-the-shelf cameras.
( 2
min )
Given a graph with a subset of labeled nodes, we are interested in the
quality of the averaging estimator which for an unlabeled node predicts the
average of the observations of its labeled neighbours. We rigorously study
concentration properties, variance bounds and risk bounds in this context.
While the estimator itself is very simple and the data generating process is
too idealistic for practical applications, we believe that our small steps will
contribute towards the theoretical understanding of more sophisticated methods
such as Graph Neural Networks.
( 2
min )
The practice of uncertainty quantification (UQ) validation, notably in
machine learning for the physico-chemical sciences, rests on several graphical
methods (scattering plots, calibration curves, reliability diagrams and
confidence curves) which explore complementary aspects of calibration, without
covering all the desirable ones. For instance, none of these methods deals with
the reliability of UQ metrics across the range of input features (adaptivity).
Based on the complementary concepts of consistency and adaptivity, the toolbox
of common validation methods for variance- and intervals- based UQ metrics is
revisited with the aim to provide a better grasp on their capabilities. This
study is conceived as an introduction to UQ validation, and all methods are
derived from a few basic rules. The methods are illustrated and tested on
synthetic datasets and representative examples extracted from the recent
physico-chemical machine learning UQ literature.
( 2
min )
Multi-label learning is usually used to mine the correlation between features
and labels, and feature selection can retain as much information as possible
through a small number of features. $\ell_{2,1}$ regularization method can get
sparse coefficient matrix, but it can not solve multicollinearity problem
effectively. The model proposed in this paper can obtain the most relevant few
features by solving the joint constrained optimization problems of $\ell_{2,1}$
and $\ell_{F}$ regularization.In manifold regularization, we implement random
walk strategy based on joint information matrix, and get a highly robust
neighborhood graph.In addition, we given the algorithm for solving the model
and proved its convergence.Comparative experiments on real-world data sets show
that the proposed method outperforms other methods.
( 2
min )
A common lens to theoretically study neural net architectures is to analyze
the functions they can approximate. However, constructions from approximation
theory may be unrealistic and therefore less meaningful. For example, a common
unrealistic trick is to encode target function values using infinite precision.
To address these issues, this work proposes a formal definition of
statistically meaningful (SM) approximation which requires the approximating
network to exhibit good statistical learnability. We study SM approximation for
two function classes: boolean circuits and Turing machines. We show that
overparameterized feedforward neural nets can SM approximate boolean circuits
with sample complexity depending only polynomially on the circuit size, not the
size of the network. In addition, we show that transformers can SM approximate
Turing machines with computation time bounded by $T$ with sample complexity
polynomial in the alphabet size, state space size, and $\log (T)$. We also
introduce new tools for analyzing generalization which provide much tighter
sample complexities than the typical VC-dimension or norm-based bounds, which
may be of independent interest.
( 3
min )
We propose to compute Wasserstein barycenters (WBs) by solving for Monge maps
with variational principle. We discuss the metric properties of WBs and explore
their connections, especially the connections of Monge WBs, to K-means
clustering and co-clustering. We also discuss the feasibility of Monge WBs on
unbalanced measures and spherical domains. We propose two new problems --
regularized K-means and Wasserstein barycenter compression. We demonstrate the
use of VWBs in solving these clustering-related problems.
( 2
min )
This paper studies the approximation capacity of ReLU neural networks with
norm constraint on the weights. We prove upper and lower bounds on the
approximation error of these networks for smooth function classes. The lower
bound is derived through the Rademacher complexity of neural networks, which
may be of independent interest. We apply these approximation bounds to analyze
the convergences of regression using norm constrained neural networks and
distribution estimation by GANs. In particular, we obtain convergence rates for
over-parameterized neural networks. It is also shown that GANs can achieve
optimal rate of learning probability distributions, when the discriminator is a
properly chosen norm constrained neural network.
( 2
min )
The extragradient (EG), introduced by G. M. Korpelevich in 1976, is a
well-known method to approximate solutions of saddle-point problems and their
extensions such as variational inequalities and monotone inclusions. Over the
years, numerous variants of EG have been proposed and studied in the
literature. Recently, these methods have gained popularity due to new
applications in machine learning and robust optimization. In this work, we
survey the latest developments in the EG method and its variants for
approximating solutions of nonlinear equations and inclusions, with a focus on
the monotonicity and co-hypomonotonicity settings. We provide a unified
convergence analysis for different classes of algorithms, with an emphasis on
sublinear best-iterate and last-iterate convergence rates. We also discuss
recent accelerated variants of EG based on both Halpern fixed-point iteration
and Nesterov's accelerated techniques. Our approach uses simple arguments and
basic mathematical tools to make the proofs as elementary as possible, while
maintaining generality to cover a broad range of problems.
( 2
min )
We present a hierarchical Bayesian learning approach to infer jointly sparse
parameter vectors from multiple measurement vectors. Our model uses separate
conditionally Gaussian priors for each parameter vector and common
gamma-distributed hyper-parameters to enforce joint sparsity. The resulting
joint-sparsity-promoting priors are combined with existing Bayesian inference
methods to generate a new family of algorithms. Our numerical experiments,
which include a multi-coil magnetic resonance imaging application, demonstrate
that our new approach consistently outperforms commonly used hierarchical
Bayesian methods.
( 2
min )
submitted by /u/azlef900
[link] [comments]
( 42
min )
submitted by /u/DarronFeldstein
[link] [comments]
( 42
min )
submitted by /u/Linkology
[link] [comments]
( 42
min )
submitted by /u/acutelychronicpanic
[link] [comments]
( 43
min )
submitted by /u/LickTempo
[link] [comments]
( 44
min )
submitted by /u/friuns
[link] [comments]
( 44
min )
[generated via AI]
submitted by /u/Aram0070
[link] [comments]
( 42
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
Paper: https://arxiv.org/abs/2303.16434
Abstract:
Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain …
( 45
min )
How easy? As easy as:
python usap_csv_eval.py data/credit-approval.csv
If your dataset is in csv format you can use this tool to get an initial indication of how predictable a target feature is. No need to sort attributes, look for missing cells etc.
The tool uses "deodel" as a robust mixed attribute classifier. Get more details at:
csv_dataset_eval.ipynb
submitted by /u/zx2zx
[link] [comments]
( 44
min )
submitted by /u/floppy_llama
[link] [comments]
( 43
min )
With the right building blocks, machine-learning models can more accurately perform tasks like fraud detection or spam filtering.
( 9
min )
Amazon Personalize is excited to announce the new Trending-Now recipe to help you recommend items gaining popularity at the fastest pace among your users. Amazon Personalize is a fully managed machine learning (ML) service that makes it easy for developers to deliver personalized experiences to their users. It enables you to improve customer engagement by […]
( 10
min )
In football, ball possession is a strong predictor for team success. It’s hard to control the game without having control over the ball. In the past three Bundesliga seasons, as well as in the current season (at the time of this writing), Bayern Munich is ranked first in the table and in ball possession percentage, […]
( 8
min )
The Bundesliga is renowned for its exceptional goalkeepers, making it potentially the most prominent among Europe’s top five leagues in this regard. Apart from the widely recognized Manuel Neuer, the Bundesliga has produced remarkable goalkeepers who have excelled in other leagues, including the likes of Marc-André ter Stegen, who is a superstar at Barcelona. In […]
( 9
min )
Powerful new large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this new Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these […]
The post AI Frontiers: AI for health and the future of research with Peter Lee appeared first on Microsoft Research.
( 27
min )
It’s another rewarding GFN Thursday, with 23 new games for April on top of 11 joining the cloud this week and a new Marvel’s Midnight Suns reward now available first for GeForce NOW Premium members. Newark, N.J., is next to complete its upgrade to RTX 4080 SuperPODs, making it the 12th region worldwide to bring Read article >
( 6
min )
There are plenty of graph neural network (GNN) accelerators being proposed.
However, they highly rely on users' hardware expertise and are usually
optimized for one specific GNN model, making them challenging for practical use
. Therefore, in this work, we propose GNNBuilder, the first automated, generic,
end-to-end GNN accelerator generation framework. It features four advantages:
(1) GNNBuilder can automatically generate GNN accelerators for a wide range of
GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch
programming interface, introducing zero overhead for algorithm developers; (3)
GNNBuilder supports end-to-end code generation, simulation, accelerator
optimization, and hardware deployment, realizing a push-button fashion for GNN
accelerator design; (4) GNNBuilder is equipped with accurate performance models
of its generated accelerator, enabling fast and flexible design space
exploration (DSE). In the experiments, first, we show that our accelerator
performance model has errors within $36\%$ for latency prediction and $18\%$
for BRAM count prediction. Second, we show that our generated accelerators can
outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is
open-source, and the code is available at
https://anonymous.4open.science/r/gnn-builder-83B4/.
( 2
min )
We present a multimodal deep learning (MDL) framework for predicting physical
properties of a 10-dimensional acrylic polymer composite material by merging
physical attributes and chemical data. Our MDL model comprises four modules,
including three generative deep learning models for material structure
characterization and a fourth model for property prediction. Our approach
handles an 18-dimensional complexity, with 10 compositional inputs and 8
property outputs, successfully predicting 913,680 property data points across
114,210 composition conditions. This level of complexity is unprecedented in
computational materials science, particularly for materials with undefined
structures. We propose a framework to analyze the high-dimensional information
space for inverse material design, demonstrating flexibility and adaptability
to various materials and scales, provided sufficient data is available. This
study advances future research on different materials and the development of
more sophisticated models, drawing us closer to the ultimate goal of predicting
all properties of all materials.
( 2
min )
In this work we develop a novel approach using deep neural networks to
reconstruct the conductivity distribution in elliptic problems from one
internal measurement. The approach is based on a mixed reformulation of the
governing equation and utilizes the standard least-squares objective to
approximate the conductivity and flux simultaneously, with deep neural networks
as ansatz functions. We provide a thorough analysis of the neural network
approximations for both continuous and empirical losses, including rigorous
error estimates that are explicit in terms of the noise level, various penalty
parameters and neural network architectural parameters (depth, width and
parameter bound). We also provide extensive numerical experiments in two- and
multi-dimensions to illustrate distinct features of the approach, e.g.,
excellent stability with respect to data noise and capability of solving
high-dimensional problems.
( 2
min )
An increasing part of energy is produced from renewable sources by a large
number of small producers. The efficiency of these sources is volatile and, to
some extent, random, exacerbating the energy market balance problem. In many
countries, that balancing is performed on day-ahead (DA) energy markets. In
this paper, we consider automated trading on a DA energy market by a medium
size prosumer. We model this activity as a Markov Decision Process and
formalize a framework in which a ready-to-use strategy can be optimized with
real-life data. We synthesize parametric trading strategies and optimize them
with an evolutionary algorithm. We also use state-of-the-art reinforcement
learning algorithms to optimize a black-box trading strategy fed with available
information from the environment that can impact future prices.
( 2
min )
This note focuses on a simple approach to the unified analysis of SGD-type
methods from (Gorbunov et al., 2020) for strongly convex smooth optimization
problems. The similarities in the analyses of different stochastic first-order
methods are discussed along with the existing extensions of the framework. The
limitations of the analysis and several alternative approaches are mentioned as
well.
( 2
min )
The generalization performance of deep neural networks with regard to the
optimization algorithm is one of the major concerns in machine learning. This
performance can be affected by various factors. In this paper, we theoretically
prove that the Lipschitz constant of a loss function is an important factor to
diminish the generalization error of the output model obtained by Adam or
AdamW. The results can be used as a guideline for choosing the loss function
when the optimization algorithm is Adam or AdamW. In addition, to evaluate the
theoretical bound in a practical setting, we choose the human age estimation
problem in computer vision. For assessing the generalization better, the
training and test datasets are drawn from different distributions. Our
experimental evaluation shows that the loss function with lower Lipschitz
constant and maximum value improves the generalization of the model trained by
Adam or AdamW.
( 2
min )
We investigate semantic guarantees of private learning algorithms for their
resilience to training Data Reconstruction Attacks (DRAs) by informed
adversaries. To this end, we derive non-asymptotic minimax lower bounds on the
adversary's reconstruction error against learners that satisfy differential
privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate
that our lower bound analysis for the latter also covers the high dimensional
regime, wherein, the input data dimensionality may be larger than the
adversary's query budget. Motivated by the theoretical improvements conferred
by metric DP, we extend the privacy analysis of popular deep learning
algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion
of metric differential privacy.
( 2
min )
The paper discusses the limitations of deep learning models in identifying
and utilizing features that remain invariant under a bijective transformation
on the data entries, which we refer to as combinatorial patterns. We argue that
the identification of such patterns may be important for certain applications
and suggest providing neural networks with information that fully describes the
combinatorial patterns of input entries and allows the network to determine
what is relevant for prediction. To demonstrate the feasibility of this
approach, we present a combinatorial convolutional neural network for word
classification.
( 2
min )
Predicting crime using machine learning and deep learning techniques has
gained considerable attention from researchers in recent years, focusing on
identifying patterns and trends in crime occurrences. This review paper
examines over 150 articles to explore the various machine learning and deep
learning algorithms applied to predict crime. The study provides access to the
datasets used for crime prediction by researchers and analyzes prominent
approaches applied in machine learning and deep learning algorithms to predict
crime, offering insights into different trends and factors related to criminal
activities. Additionally, the paper highlights potential gaps and future
directions that can enhance the accuracy of crime prediction. Finally, the
comprehensive overview of research discussed in this paper on crime prediction
using machine learning and deep learning approaches serves as a valuable
reference for researchers in this field. By gaining a deeper understanding of
crime prediction techniques, law enforcement agencies can develop strategies to
prevent and respond to criminal activities more effectively.
( 3
min )
Classical results in neural network approximation theory show how arbitrary
continuous functions can be approximated by networks with a single hidden
layer, under mild assumptions on the activation function. However, the
classical theory does not give a constructive means to generate the network
parameters that achieve a desired accuracy. Recent results have demonstrated
that for specialized activation functions, such as ReLUs and some classes of
analytic functions, high accuracy can be achieved via linear combinations of
randomly initialized activations. These recent works utilize specialized
integral representations of target functions that depend on the specific
activation functions used. This paper defines mollified integral
representations, which provide a means to form integral representations of
target functions using activations for which no direct integral representation
is currently known. The new construction enables approximation guarantees for
randomly initialized networks for a variety of widely used activation
functions.
( 2
min )
We investigate semantic guarantees of private learning algorithms for their
resilience to training Data Reconstruction Attacks (DRAs) by informed
adversaries. To this end, we derive non-asymptotic minimax lower bounds on the
adversary's reconstruction error against learners that satisfy differential
privacy (DP) and metric differential privacy (mDP). Furthermore, we demonstrate
that our lower bound analysis for the latter also covers the high dimensional
regime, wherein, the input data dimensionality may be larger than the
adversary's query budget. Motivated by the theoretical improvements conferred
by metric DP, we extend the privacy analysis of popular deep learning
algorithms such as DP-SGD and Projected Noisy SGD to cover the broader notion
of metric differential privacy.
( 2
min )
This paper provides a finite-time analysis of linear stochastic approximation
(LSA) algorithms with fixed step size, a core method in statistics and machine
learning. LSA is used to compute approximate solutions of a $d$-dimensional
linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$ for which
$(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by
(asymptotically) unbiased observations
$\{(\mathbf{A}(Z_n),\mathbf{b}(Z_n))\}_{n \in \mathbb{N}}$. We consider here
the case where $\{Z_n\}_{n \in \mathbb{N}}$ is an i.i.d. sequence or a
uniformly geometrically ergodic Markov chain. We derive $p$-th moment and
high-probability deviation bounds for the iterates defined by LSA and its
Polyak-Ruppert-averaged version. Our finite-time instance-dependent bounds for
the averaged LSA iterates are sharp in the sense that the leading term we
obtain coincides with the local asymptotic minimax limit. Moreover, the
remainder terms of our bounds admit a tight dependence on the mixing time
$t_{\operatorname{mix}}$ of the underlying chain and the norm of the noise
variables. We emphasize that our result requires the SA step size to scale only
with logarithm of the problem dimension $d$.
( 2
min )
In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error
bound for linear time-invariant (LTI) stochastic dynamical systems with inputs.
Such bounds are widespread in machine learning, and they are useful for
characterizing the predictive power of models learned from finitely many data
points. In particular, with the bound derived in this paper relates future
average prediction errors with the prediction error generated by the model on
the data used for learning. In turn, this allows us to provide finite-sample
error bounds for a wide class of learning/system identification algorithms.
Furthermore, as LTI systems are a sub-class of recurrent neural networks
(RNNs), these error bounds could be a first step towards PAC-Bayesian bounds
for RNNs.
( 2
min )
This paper considers binary classification of high-dimensional features under
a postulated model with a low-dimensional latent Gaussian mixture structure and
non-vanishing noise. A generalized least squares estimator is used to estimate
the direction of the optimal separating hyperplane. The estimated hyperplane is
shown to interpolate on the training data. While the direction vector can be
consistently estimated as could be expected from recent results in linear
regression, a naive plug-in estimate fails to consistently estimate the
intercept. A simple correction, that requires an independent hold-out sample,
renders the procedure minimax optimal in many scenarios. The interpolation
property of the latter procedure can be retained, but surprisingly depends on
the way the labels are encoded.
( 2
min )
We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$
regret bound, where $d$ is the dimension of contexts and $T$ isthe time
horizon. Our proposed algorithm is equipped with a novel estimator in which
exploration is embedded through explicit randomization. Depending on the
randomization, our proposed estimator takes contributions either from contexts
of all arms or from selected contexts. We establish a self-normalized bound for
our estimator, which allows a novel decomposition of the cumulative regret into
\textit{additive} dimension-dependent terms instead of multiplicative terms. We
also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem
setting. Hence, the regret of our proposed algorithm matches the lower bound up
to logarithmic factors. The numerical experiments support the theoretical
guarantees and show that our proposed method outperforms the existing linear
bandit algorithms.
( 2
min )
Orthogonality constraints naturally appear in many machine learning problems,
from Principal Components Analysis to robust neural network training. They are
usually solved using Riemannian optimization algorithms, which minimize the
objective function while enforcing the constraint. However, enforcing the
orthogonality constraint can be the most time-consuming operation in such
algorithms. Recently, Ablin & Peyr\'e (2022) proposed the Landing algorithm, a
method with cheap iterations that does not enforce the orthogonality constraint
but is attracted towards the manifold in a smooth manner. In this article, we
provide new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold, the set of rectangular
orthogonal matrices. We also consider stochastic and variance reduction
algorithms when the cost function is an average of many functions. We
demonstrate that all these methods have the same rate of convergence as their
Riemannian counterparts that exactly enforce the constraint. Finally, our
experiments demonstrate the promise of our approach to an array of
machine-learning problems that involve orthogonality constraints.
( 2
min )
Inverse optimal control methods can be used to characterize behavior in
sequential decision-making tasks. Most existing work, however, requires the
control signals to be known, or is limited to fully-observable or linear
systems. This paper introduces a probabilistic approach to inverse optimal
control for stochastic non-linear systems with missing control signals and
partial observability that unifies existing approaches. By using an explicit
model of the noise characteristics of the sensory and control systems of the
agent in conjunction with local linearization techniques, we derive an
approximate likelihood for the model parameters, which can be computed within a
single forward pass. We evaluate our proposed method on stochastic and
partially observable version of classic control tasks, a navigation task, and a
manual reaching task. The proposed method has broad applicability, ranging from
imitation learning to sensorimotor neuroscience.
( 2
min )
Individualized treatment decisions can improve health outcomes, but using
data to make these decisions in a reliable, precise, and generalizable way is
challenging with a single dataset. Leveraging multiple randomized controlled
trials allows for the combination of datasets with unconfounded treatment
assignment to improve the power to estimate heterogeneous treatment effects.
This paper discusses several non-parametric approaches for estimating
heterogeneous treatment effects using data from multiple trials. We extend
single-study methods to a scenario with multiple trials and explore their
performance through a simulation study, with data generation scenarios that
have differing levels of cross-trial heterogeneity. The simulations demonstrate
that methods that directly allow for heterogeneity of the treatment effect
across trials perform better than methods that do not, and that the choice of
single-study method matters based on the functional form of the treatment
effect. Finally, we discuss which methods perform well in each setting and then
apply them to four randomized controlled trials to examine effect heterogeneity
of treatments for major depressive disorder.
( 2
min )
In this paper, we propose a randomly projected convex clustering model for
clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$
with $K$ hidden clusters. Compared to the convex clustering model for
clustering original data with dimension $d$, we prove that, under some mild
conditions, the perfect recovery of the cluster membership assignments of the
convex clustering model, if exists, can be preserved by the randomly projected
convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$,
where $0 < \epsilon < 1$ is some given parameter. We further prove that the
embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is
independent of the number of data points. Extensive numerical experiment
results will be presented in this paper to demonstrate the robustness and
superior performance of the randomly projected convex clustering model. The
numerical results presented in this paper also demonstrate that the randomly
projected convex clustering model can outperform the randomly projected K-means
model in practice.
( 2
min )
The maximum likelihood method is the best-known method for estimating the
probabilities behind the data. However, the conventional method obtains the
probability model closest to the empirical distribution, resulting in
overfitting. Then regularization methods prevent the model from being
excessively close to the wrong probability, but little is known systematically
about their performance. The idea of regularization is similar to
error-correcting codes, which obtain optimal decoding by mixing suboptimal
solutions with an incorrectly received code. The optimal decoding in
error-correcting codes is achieved based on gauge symmetry. We propose a
theoretically guaranteed regularization in the maximum likelihood method by
focusing on a gauge symmetry in Kullback -- Leibler divergence. In our
approach, we obtain the optimal model without the need to search for
hyperparameters frequently appearing in regularization.
( 2
min )
submitted by /u/currentscurrents
[link] [comments]
( 45
min )
Lightning AI released Lit-LLaMa: an architecture based on Meta’s LLaMa but with a more permissive license. However, they still rely on the weights trained by Meta, which have a license restricting commercial usage.
Is developing the architecture enough to change the license associated with the model’s weights?
submitted by /u/murphwalker
[link] [comments]
( 47
min )
submitted by /u/transdimensionalmeme
[link] [comments]
( 42
min )
submitted by /u/farraway45
[link] [comments]
( 49
min )
submitted by /u/faxfrag
[link] [comments]
( 42
min )
Great listen discussing AGI and ChatGPT
submitted by /u/acatinasweater
[link] [comments]
( 42
min )
submitted by /u/fignewtgingrich
[link] [comments]
( 42
min )
This is a guest post by Neslihan Erdogan, Global Industrial IT Manager at HAYAT HOLDING. With the ongoing digitization of the manufacturing processes and Industry 4.0, there is enormous potential to use machine learning (ML) for quality prediction. Process manufacturing is a production method that uses formulas or recipes to produce goods by combining ingredients […]
( 11
min )
On November 30, 2021, we announced the general availability of Amazon SageMaker Canvas, a visual point-and-click interface that enables business analysts to generate highly accurate machine learning (ML) predictions without having to write a single line of code. With Canvas, you can take ML mainstream throughout your organization so business analysts without data science or […]
( 7
min )
The United Nations (UN) was founded in 1945 by 51 original Member States committed to maintaining international peace and security, developing friendly relations among nations, and promoting social progress, better living standards, and human rights. The UN is currently made up of 193 Member States and has evolved over the years to keep pace with […]
( 9
min )
With further development, the programmable system could be used in a range of applications including gene and cancer therapies.
( 8
min )
Announcements New Books and Courses Explore Synthetic Data, ML Strategies MLtechniques released two new books recently. The first one, version 4.1 now, deals with synthetic data. This updated version includes a chapter on GAN (generative adversarial networks), with a comparison to more traditional methods such as copulas. Applied to real-life datasets, the author discusses the… Read More »DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies
The post DSC Weekly 29 March 2023 – New Books and Courses Explore Synthetic Data, ML Strategies appeared first on Data Science Central.
( 19
min )
Blender, the world’s most popular 3D creation suite — free and open source — released its major version 3.5 update. Expected to have a profound impact on 3D creative workflows, this latest release features support for Open Shading Language (OSL) shaders with the NVIDIA OptiX ray-tracing engine.
( 7
min )
Tools like ChatGPT have awakened the world to the potential of generative AI. Now, much more is coming. On the latest episode of the NVIDIA AI Podcast, Yves Jacquier, executive director of Ubisoft La Forge, shares valuable insights into the transformative potential of generative AI in the gaming industry. With over two decades of experience Read article >
( 5
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
We revisit the Gaussian process model with spherical harmonic features and
study connections between the associated RKHS, its eigenstructure and deep
models. Based on this, we introduce a new class of kernels which correspond to
deep models of continuous depth. In our formulation, depth can be estimated as
a kernel hyper-parameter by optimizing the evidence lower bound. Further, we
introduce sparseness in the eigenbasis by variational learning of the spherical
harmonic phases. This enables scaling to larger input dimensions than
previously, while also allowing for learning of high frequency variations. We
validate our approach on machine learning benchmark datasets.
( 2
min )
The estimation of the generalization error of classifiers often relies on a
validation set. Such a set is hardly available in few-shot learning scenarios,
a highly disregarded shortcoming in the field. In these scenarios, it is common
to rely on features extracted from pre-trained neural networks combined with
distance-based classifiers such as nearest class mean. In this work, we
introduce a Gaussian model of the feature distribution. By estimating the
parameters of this model, we are able to predict the generalization error on
new classification tasks with few samples. We observe that accurate distance
estimates between class-conditional densities are the key to accurate estimates
of the generalization performance. Therefore, we propose an unbiased estimator
for these distances and integrate it in our numerical analysis. We empirically
show that our approach outperforms alternatives such as the leave-one-out
cross-validation strategy.
( 2
min )
Images generated by high-resolution SAR have vast areas of application as
they can work better in adverse light and weather conditions. One such area of
application is in the military systems. This study is an attempt to explore the
suitability of current state-of-the-art models introduced in the domain of
computer vision for SAR target classification (MSTAR). Since the application of
any solution produced for military systems would be strategic and real-time,
accuracy is often not the only criterion to measure its performance. Other
important parameters like prediction time and input resiliency are equally
important. The paper deals with these issues in the context of SAR images.
Experimental results show that deep learning models can be suitably applied in
the domain of SAR image classification with the desired performance levels.
( 2
min )
There is considerable evidence that machine learning algorithms have better
predictive abilities than humans in various financial settings. But, the
literature has not tested whether these algorithmic predictions are more
rational than human predictions. We study the predictions of corporate earnings
from several algorithms, notably linear regressions and a popular algorithm
called Gradient Boosted Regression Trees (GBRT). On average, GBRT outperformed
both linear regressions and human stock analysts, but it still overreacted to
news and did not satisfy rational expectation as normally defined. By reducing
the learning rate, the magnitude of overreaction can be minimized, but it comes
with the cost of poorer out-of-sample prediction accuracy. Human stock analysts
who have been trained in machine learning methods overreact less than
traditionally trained analysts. Additionally, stock analyst predictions reflect
information not otherwise available to machine algorithms.
( 2
min )
Estimating the generalization performance is practically challenging on
out-of-distribution (OOD) data without ground truth labels. While previous
methods emphasize the connection between distribution difference and OOD
accuracy, we show that a large domain gap not necessarily leads to a low test
accuracy. In this paper, we investigate this problem from the perspective of
feature separability, and propose a dataset-level score based upon feature
dispersion to estimate the test accuracy under distribution shift. Our method
is inspired by desirable properties of features in representation learning:
high inter-class dispersion and high intra-class compactness. Our analysis
shows that inter-class dispersion is strongly correlated with the model
accuracy, while intra-class compactness does not reflect the generalization
performance on OOD data. Extensive experiments demonstrate the superiority of
our method in both prediction performance and computational efficiency.
( 2
min )
Neural operators have emerged as a powerful tool for solving partial
differential equations in the context of scientific machine learning. Here, we
implement and train a modified Fourier neural operator as a surrogate solver
for electromagnetic scattering problems and compare its data efficiency to
existing methods. We further demonstrate its application to the gradient-based
nanophotonic inverse design of free-form, fully three-dimensional
electromagnetic scatterers, an area that has so far eluded the application of
deep learning techniques.
( 2
min )
We consider a Multi-Armed Bandit problem in which the rewards are
non-stationary and are dependent on past actions and potentially on past
contexts. At the heart of our method, we employ a recurrent neural network,
which models these sequences. In order to balance between exploration and
exploitation, we present an energy minimization term that prevents the neural
network from becoming too confident in support of a certain action. This term
provably limits the gap between the maximal and minimal probabilities assigned
by the network. In a diverse set of experiments, we demonstrate that our method
is at least as effective as methods suggested to solve the sub-problem of
Rotting Bandits, and can solve intuitive extensions of various benchmark
problems. We share our implementation at
https://github.com/rotmanmi/Energy-Regularized-RNN.
( 2
min )
This paper presents a framework for training an agent to actively request
help in object-goal navigation tasks, with feedback indicating the location of
the target object in its field of view. To make the agent more robust in
scenarios where a teacher may not always be available, the proposed training
curriculum includes a mix of episodes with and without feedback. The results
show that this approach improves the agent's performance, even in the absence
of feedback.
( 2
min )
Quantitative characterizations and estimations of uncertainty are of
fundamental importance in optimization and decision-making processes. Herein,
we propose intuitive scores, which we call certainty and doubt, that can be
used in both a Bayesian and frequentist framework to assess and compare the
quality and uncertainty of predictions in (multi-)classification decision
machine learning problems.
( 2
min )
We propose a model to forecast large realized covariance matrices of returns,
applying it to the constituents of the S\&P 500 daily. To address the curse of
dimensionality, we decompose the return covariance matrix using standard
firm-level factors (e.g., size, value, and profitability) and use sectoral
restrictions in the residual covariance matrix. This restricted model is then
estimated using vector heterogeneous autoregressive (VHAR) models with the
least absolute shrinkage and selection operator (LASSO). Our methodology
improves forecasting precision relative to standard benchmarks and leads to
better estimates of minimum variance portfolios.
( 2
min )
We revisit the Gaussian process model with spherical harmonic features and
study connections between the associated RKHS, its eigenstructure and deep
models. Based on this, we introduce a new class of kernels which correspond to
deep models of continuous depth. In our formulation, depth can be estimated as
a kernel hyper-parameter by optimizing the evidence lower bound. Further, we
introduce sparseness in the eigenbasis by variational learning of the spherical
harmonic phases. This enables scaling to larger input dimensions than
previously, while also allowing for learning of high frequency variations. We
validate our approach on machine learning benchmark datasets.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. This transformation will improve the
statistical performance of WDRO because the adjusted WDRO estimator is
asymptotically unbiased and has an asymptotically smaller mean squared error.
The adjusted WDRO will not mitigate the out-of-sample performance guarantee of
WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator
are presented, and the procedure for the computation of the adjusted WDRO
estimator is given. Specifically, we will show how the adjusted WDRO estimator
is developed in the generalized linear model. Numerical experiments demonstrate
the favorable practical performance of the adjusted estimator over the classic
one.
( 2
min )
We are in the age of AI. I was wondering if there are any projects on the horizon of subtitle converters that can take in an image-based subtitle like VobSub and HDMV PGS and then turn it into SRT subs?
Microsoft just recently released a highly impressive OCR model that was trained on 558 million parameters, named TrOCR-LARGE.
TrOCR is a model that uses an image Transformer encoder and an autoregressive text Transformer decoder to perform optical character recognition (OCR). It is pre-trained in 2 stages before being fine-tuned on downstream datasets.
Study of Microsoft's TrOCR
Hugging Face Documentation
GitHub Source Code and code direct from Microsoft
submitted by /u/objectivelywrongbro
[link] [comments]
( 43
min )
submitted by /u/tlubz
[link] [comments]
( 42
min )
submitted by /u/StevenVincentOne
[link] [comments]
( 42
min )
submitted by /u/BrosephSmithSr
[link] [comments]
( 42
min )
submitted by /u/wgmimedia
[link] [comments]
( 46
min )
submitted by /u/MLC_Money
[link] [comments]
( 43
min )
Predictive maintenance is a data-driven maintenance strategy for monitoring industrial assets in order to detect anomalies in equipment operations and health that could lead to equipment failures. Through proactive monitoring of an asset’s condition, maintenance personnel can be alerted before issues occur, thereby avoiding costly unplanned downtime, which in turn leads to an increase in […]
( 10
min )
I put two copies of gpt-3.5 as partners, one plays the role of the oracle that answers yes/no questions, the other as the role of a guesser that asks yes/no questions. I want to see if gpt-3.5 would perform well on this "dynamic" task -- i.e. rather than a fixed test set with 1 good answer, 20 questions can be played into many paths, depending on the questions being asked.
the result is poor 68 / 1823
20 Questions forces the guesser to be cohesive in a long chain of yes / no predicates. You want an actually difficult and consistent world model? This is a good one that is combinatorially complex.
...
20 Questions (and other interactive, self-play tasks) is worth looking at in evaluating LLMs.
for more details see blog post: https://evanthebouncy.medium.com/llm-self-play-on-20-questions-dee7a8c63377
I'd be happy to answer some questions here as well
--evan
submitted by /u/evanthebouncy
[link] [comments]
( 44
min )
Auto-Analyst leverages power of cutting-edge Large Language Models (LLMs) to revolutionize data analytics. This powerful UI tool simplifies the data analysis process, eliminating the need for complex coding.
🔎 Key Features of Auto-Analyst:
Streamlined data analysis process utilizing advanced AI technology and LLMs
Enhanced productivity and efficiency through intuitive language-based commands
Increased accessibility to data analysis for professionals across industries
🔗 Want to explore and contribute to the project? Head over to the GitHub repo: https://github.com/aadityaubhat/auto-analyst
submitted by /u/aadityaubhat
[link] [comments]
( 45
min )
submitted by /u/Cool_Abbreviations_9
[link] [comments]
( 50
min )
submitted by /u/tamilupk
[link] [comments]
( 50
min )
submitted by /u/Last_Salad_5080
[link] [comments]
( 41
min )
submitted by /u/katiecharm
[link] [comments]
( 52
min )
submitted by /u/robotphilanthropist
[link] [comments]
( 41
min )
The demand for energy continues to rise globally, while the world is searching for cleaner and more efficient energy sources. Fusion energy is one of the most promising options as it offers an abundant and environment-friendly energy source with almost zero carbon emissions. The industry is booming exponentially in the coming years due to increasing… Read More »Explore the Trends and Product Developments in the Growing Fusion Energy Industry
The post Explore the Trends and Product Developments in the Growing Fusion Energy Industry appeared first on Data Science Central.
( 20
min )
There has been much talk of ethics in AI recently. Some enterprises and vendors and consultants have talked of the need for a chief AI ethics officer. On the face of it, it sounds like a good idea, but I don’t think it is. Note that I am not arguing against ethical AI or responsible AI… Read More »Why We Don’t Need a Chief AI Ethics Officer and What We Need Instead
The post Why We Don’t Need a Chief AI Ethics Officer and What We Need Instead appeared first on Data Science Central.
( 19
min )
This is the final (?) blog on the series of how technologies like AI and ChatGPT are fueling a fundamental transformation of our educational systems and institutions. We need a plan – and quickly – to update our educational systems and institutions in an age where AI and Big Data are asserting a more significant… Read More »Future of Education: Application not Regurgitation of Knowledge – Part III
The post Future of Education: Application not Regurgitation of Knowledge – Part III appeared first on Data Science Central.
( 21
min )
Regular physics-informed neural networks (PINNs) predict the solution of
partial differential equations using sparse labeled data but only over a single
domain. On the other hand, fully supervised learning models are first trained
usually over a few thousand domains with known solutions (i.e., labeled data)
and then predict the solution over a few hundred unseen domains.
Physics-informed PointNet (PIPN) is primarily designed to fill this gap between
PINNs (as weakly supervised learning models) and fully supervised learning
models. In this article, we demonstrate that PIPN predicts the solution of
desired partial differential equations over a few hundred domains
simultaneously, while it only uses sparse labeled data. This framework benefits
fast geometric designs in the industry when only sparse labeled data are
available. Particularly, we show that PIPN predicts the solution of a plane
stress problem over more than 500 domains with different geometries,
simultaneously. Moreover, we pioneer implementing the concept of remarkable
batch size (i.e., the number of geometries fed into PIPN at each sub-epoch)
into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133.
Additionally, the effect of the PIPN size, symmetric function in the PIPN
architecture, and static and dynamic weights for the component of the sparse
labeled data in the loss function are investigated.
( 2
min )
Spiking neural networks (SNNs) have gained attention as a promising
alternative to traditional artificial neural networks (ANNs) due to their
potential for energy efficiency and their ability to model spiking behavior in
biological systems. However, the training of SNNs is still a challenging
problem, and new techniques are needed to improve their performance. In this
paper, we study the impact of skip connections on SNNs and propose a
hyperparameter optimization technique that adapts models from ANN to SNN. We
demonstrate that optimizing the position, type, and number of skip connections
can significantly improve the accuracy and efficiency of SNNs by enabling
faster convergence and increasing information flow through the network. Our
results show an average +8% accuracy increase on CIFAR-10-DVS and DVS128
Gesture datasets adaptation of multiple state-of-the-art models.
( 2
min )
We show that the representation cost of fully connected neural networks with
homogeneous nonlinearities - which describes the implicit bias in function
space of networks with $L_2$-regularization or with losses such as the
cross-entropy - converges as the depth of the network goes to infinity to a
notion of rank over nonlinear functions. We then inquire under which conditions
the global minima of the loss recover the `true' rank of the data: we show that
for too large depths the global minimum will be approximately rank 1
(underestimating the rank); we then argue that there is a range of depths which
grows with the number of datapoints where the true rank is recovered. Finally,
we discuss the effect of the rank of a classifier on the topology of the
resulting class boundaries and show that autoencoders with optimal nonlinear
rank are naturally denoising.
( 2
min )
submitted by /u/Notalabel_4566
[link] [comments]
( 41
min )
submitted by /u/zhaulted
[link] [comments]
( 41
min )
submitted by /u/loopuleasa
[link] [comments]
( 46
min )
You can see the actual conversations here:
GPT-4: https://imgur.com/a/TxM0Rb1
Google’s Bard: https://imgur.com/a/ZWcoe2o
Both GPT-4 and Google's Bard AI were asked to generate a list of seven fantasy RPG elements, along with a color to represent each element. GPT-4's output was more detailed, providing a clear and engaging list of elements that incorporated aspects of the natural world, light and shadow, and the arcane. Each element was thoughtfully described and connected to a specific color, allowing for a diverse and dynamic RPG experience.
Google's Bard AI initially misunderstood the question and required clarification before providing an acceptable list of elements. Its output was simpler and more straightforward. While these elements could form the basis of a fantasy world, the…
( 45
min )
submitted by /u/ProudGirlDad2323
[link] [comments]
( 42
min )
submitted by /u/coolbern
[link] [comments]
( 45
min )
submitted by /u/Impressive-Ad-8964
[link] [comments]
( 42
min )
submitted by /u/IrritablyGrim
[link] [comments]
( 43
min )
submitted by /u/TernaryJimbo
[link] [comments]
( 46
min )
submitted by /u/XiaolongWang
[link] [comments]
( 45
min )
submitted by /u/stoniejohnson
[link] [comments]
( 43
min )
The plot shows the cost function evolution through epochs.
The math is my own code and it uses backpropagation.
submitted by /u/helpmeihatemyself
[link] [comments]
( 41
min )
submitted by /u/ManuelRodriguez331
[link] [comments]
( 41
min )
submitted by /u/Possible_Being_3189
[link] [comments]
( 42
min )
submitted by /u/Tchoupitoulas_Street
[link] [comments]
( 41
min )
submitted by /u/TheExtimate
[link] [comments]
( 45
min )
submitted by /u/GraspingSonder
[link] [comments]
( 42
min )
submitted by /u/Impressive-Ad-8964
[link] [comments]
( 43
min )
submitted by /u/davidbun
[link] [comments]
( 45
min )
submitted by /u/FT05-biggoye
[link] [comments]
( 44
min )
I've actually reproduced quite a bit of the functionality of WebGPT in langchain with gpt-3.5 by exposing both a Google tool and a scrape tool to the LLM, however it's not as good or as polished as what's seen in the WebGPT paper back from 2021.
https://openai.com/research/webgpt
The ability to follow links on pages and so on is extremely nice. Poked around hugging face for something similar, nothing. It's not like WebGPT is exactly on the openai API either.
submitted by /u/light24bulbs
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
Paper: https://arxiv.org/abs/2303.11366
Blog: https://nanothoughts.substack.com/p/reflecting-on-reflexion
Github: https://github.com/noahshinn024/reflexion-human-eval
Twitter: https://twitter.com/johnjnay/status/1639362071807549446?s=20
Abstract:
Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making…
( 50
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
submitted by /u/abstractcontrol
[link] [comments]
( 41
min )
Applying deep learning concepts from image detection and graph theory has
greatly advanced protein-ligand binding affinity prediction, a challenge with
enormous ramifications for both drug discovery and protein engineering. We
build upon these advances by designing a novel deep learning architecture
consisting of a 3-dimensional convolutional neural network utilizing
channel-wise attention and two graph convolutional networks utilizing
attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based
Convolutional Neural Network) obtains state-of-the-art results on the PDBbind
v.2016 core set, the most widely recognized benchmark in the field. We
extensively assess the generalizability of our model using multiple train-test
splits, each of which maximizes differences between either protein structures,
protein sequences, or ligand extended-connectivity fingerprints of complexes in
the training and test sets. Furthermore, we perform 10-fold cross-validation
with a similarity cutoff between SMILES strings of ligands in the training and
test sets, and also evaluate the performance of HAC-Net on lower-quality data.
We envision that this model can be extended to a broad range of supervised
learning problems related to structure-based biomolecular property prediction.
All of our software is available as open source at
https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is
available through PyPI.
( 2
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
submitted by /u/Geeki_dude
[link] [comments]
( 41
min )
submitted by /u/pigeonsusemagic
[link] [comments]
( 42
min )
submitted by /u/adititalksai
[link] [comments]
( 42
min )
submitted by /u/MI6Section13
[link] [comments]
( 42
min )
submitted by /u/ai_jobs
[link] [comments]
( 41
min )
Databricks shows that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in less than three hours on one machine, using high-quality training data.
They fine tuned GPT-J using the Alpaca dataset.
Blog: https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
Github: https://github.com/databrickslabs/dolly
submitted by /u/austintackaberry
[link] [comments]
( 50
min )
I spoke to ex-ML Platform Lead at Stitch Fix to understand the practical challenges in building and managing the ML platform and if someone has to start what is the ideal starting point.
What do you think?
link: https://www.youtube.com/watch?v=TbP5G188kX8
submitted by /u/DiligentEmployee3610
[link] [comments]
( 43
min )
K-Planes was released with PyTorch code only and CoBaFa didn't provide code, I implemented both of them in a short repo with CUDA acceleration : https://github.com/loicmagne/tinynerf
submitted by /u/Lairv
[link] [comments]
( 43
min )
Hey! We're creating an open-source training framework focused on evolutionary hyperparameter optimization for RL. This offers a speed up of 10x over other HPO methods!
Check it out and please get involved if you would be interested in working on this - any contributions are super valuable.
We believe this can change the way we train our models, and democratise access to RL for people and businesses who don't currently have the resources for it!
GitHub: https://github.com/AgileRL/AgileRL
submitted by /u/nicku_a
[link] [comments]
( 46
min )
submitted by /u/blabboy
[link] [comments]
( 45
min )
hey folks, happy Friday! I wish to get some feedback for my recent project of a minimum example of using RLHF on language models to improve human alignment.
The goal is to compare with vanilla GPT-2 and supervised fine-tuned GPT-2 to see how much RLHF can benefit small models. Also I hope this project can show an example of the minimum requirements to build a RLHF training pipeline for LLMs.
Github: https://github.com/ethanyanjiali/minChatGPT Demo: https://colab.research.google.com/drive/1LR1sbWTyaNAmTZ1g1M2tpmU_pFw1lyEX?usp=sharing
Thanks a lot for any suggestions and feedback!
submitted by /u/liyanjia92
[link] [comments]
( 47
min )
Introduction Java Concurrency API is a set of Java packages and classes developed to create multi-threaded applications. It was introduced in Java 5 and is aimed to make writing easier for concurrent and parallel code in Java. The Java Concurrency API offers classes and utilities that allow developers to create and manage threads, synchronize access… Read More »Developing Multi-Threaded Applications with Java Concurrency API
The post Developing Multi-Threaded Applications with Java Concurrency API appeared first on Data Science Central.
( 24
min )
An update on our findings, the actions we’ve taken, and technical details of the bug.
( 3
min )
Although reinforcement learning has seen tremendous success recently, this
kind of trial-and-error learning can be impractical or inefficient in complex
environments. The use of demonstrations, on the other hand, enables agents to
benefit from expert knowledge rather than having to discover the best action to
take through exploration. In this survey, we discuss the advantages of using
demonstrations in sequential decision making, various ways to apply
demonstrations in learning-based decision making paradigms (for example,
reinforcement learning and planning in the learned models), and how to collect
the demonstrations in various scenarios. Additionally, we exemplify a practical
pipeline for generating and utilizing demonstrations in the recently proposed
ManiSkill robot learning benchmark.
( 2
min )
We investigate the optimization of multilayer perceptrons on symmetric data.
We compare the strategy of constraining the architecture to be equivariant to
that of using augmentation. We show that, under natural assumptions on the loss
and non-linearities, the sets of equivariant stationary points are identical
for the two strategies, and that the set of equivariant layers is invariant
under the gradient flow for augmented models. Finally, we show that stationary
points may be unstable for augmented training although they are stable for the
equivariant models
( 2
min )
People nowadays use search engines like Google, Yahoo, and Bing to find
information on the Internet. Due to explosion in data, it is helpful for users
if they are provided relevant summaries of the search results rather than just
links to webpages. Text summarization has become a vital approach to help
consumers swiftly grasp vast amounts of information.In this paper, different
pre-trained models for text summarization are evaluated on different datasets.
Specifically, we have used three different pre-trained models, namely,
google/pegasus-cnn-dailymail, T5-base, facebook/bart-large-cnn. We have
considered three different datasets, namely, CNN-dailymail, SAMSum and BillSum
to get the output from the above three models. The pre-trained models are
compared over these different datasets, each of 2000 examples, through ROUGH
and BLEU metrics.
( 2
min )
With the increase of distance learning, in general, and e-learning, in
particular, having a system capable of determining the engagement of students
is of primordial importance, and one of the biggest challenges, both for
teachers, researchers and policy makers. Here, we present a system to detect
the engagement level of the students. It uses only information provided by the
typical built-in web-camera present in a laptop computer, and was designed to
work in real time. We combine information about the movements of the eyes and
head, and facial emotions to produce a concentration index with three classes
of engagement: "very engaged", "nominally engaged" and "not engaged at all".
The system was tested in a typical e-learning scenario, and the results show
that it correctly identifies each period of time where students were "very
engaged", "nominally engaged" and "not engaged at all". Additionally, the
results also show that the students with best scores also have higher
concentration indexes.
( 2
min )
In applications of offline reinforcement learning to observational data, such
as in healthcare or education, a general concern is that observed actions might
be affected by unobserved factors, inducing confounding and biasing estimates
derived under the assumption of a perfect Markov decision process (MDP) model.
Here we tackle this by considering off-policy evaluation in a partially
observed MDP (POMDP). Specifically, we consider estimating the value of a given
target policy in a POMDP given trajectories with only partial state
observations generated by a different and unknown policy that may depend on the
unobserved state. We tackle two questions: what conditions allow us to identify
the target policy value from the observed data and, given identification, how
to best estimate it. To answer these, we extend the framework of proximal
causal inference to our POMDP setting, providing a variety of settings where
identification is made possible by the existence of so-called bridge functions.
We then show how to construct semiparametrically efficient estimators in these
settings. We term the resulting framework proximal reinforcement learning
(PRL). We demonstrate the benefits of PRL in an extensive simulation study and
on the problem of sepsis management.
( 2
min )
In the realm of cybersecurity, intrusion detection systems (IDS) detect and
prevent attacks based on collected computer and network data. In recent
research, IDS models have been constructed using machine learning (ML) and deep
learning (DL) methods such as Random Forest (RF) and deep neural networks
(DNN). Feature selection (FS) can be used to construct faster, more
interpretable, and more accurate models. We look at three different FS
techniques; RF information gain (RF-IG), correlation feature selection using
the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our
results show CFS-BA to be the most efficient of the FS methods, building in 55%
of the time of the best RF-IG model while achieving 99.99% of its accuracy.
This reinforces prior contributions attesting to CFS-BA's accuracy while
building upon the relationship between subset size, CFS score, and RF-IG score
in final results.
( 2
min )
Recently, significant progress has been made regarding the statistical
understanding of artificial neural networks (ANNs). ANNs are motivated by the
functioning of the brain, but differ in several crucial aspects. In particular,
the locality in the updating rule of the connection parameters in biological
neural networks (BNNs) makes it biologically implausible that the learning of
the brain is based on gradient descent. In this work, we look at the brain as a
statistical method for supervised learning. The main contribution is to relate
the local updating rule of the connection parameters in BNNs to a zero-order
optimization method. It is shown that the expected values of the iterates
implement a modification of gradient descent.
( 2
min )
Hybrid ventilation is an energy-efficient solution to provide fresh air for
most climates, given that it has a reliable control system. To operate such
systems optimally, a high-fidelity control-oriented modesl is required. It
should enable near-real time forecast of the indoor air temperature based on
operational conditions such as window opening and HVAC operating schedules.
However, physics-based control-oriented models (i.e., white-box models) are
labour-intensive and computationally expensive. Alternatively, black-box models
based on artificial neural networks can be trained to be good estimators for
building dynamics. This paper investigates the capabilities of a deep neural
network (DNN), which is a multivariate multi-head attention-based long
short-term memory (LSTM) encoder-decoder neural network, to predict indoor air
temperature when windows are opened or closed. Training and test data are
generated from a detailed multi-zone office building model (EnergyPlus).
Pseudo-random signals are used for the indoor air temperature setpoints and
window opening instances. The results indicate that the DNN is able to
accurately predict the indoor air temperature of five zones whenever windows
are opened or closed. The prediction error plateaus after the 24th step ahead
prediction (6 hr ahead prediction).
( 2
min )
A fundamental problem in quantum physics is to encode functions that are
completely anti-symmetric under permutations of identical particles. The Barron
space consists of high-dimensional functions that can be parameterized by
infinite neural networks with one hidden layer. By explicitly encoding the
anti-symmetric structure, we prove that the anti-symmetric functions which
belong to the Barron space can be efficiently approximated with sums of
determinants. This yields a factorial improvement in complexity compared to the
standard representation in the Barron space and provides a theoretical
explanation for the effectiveness of determinant-based architectures in
ab-initio quantum chemistry.
( 2
min )
In this paper, we introduce a new class of functions on $\mathbb{R}$ that is
closed under composition, and contains the logistic sigmoid function. We use
this class to show that any 1-dimensional neural network of arbitrary depth
with logistic sigmoid activation functions has at most three fixed points.
While such neural networks are far from real world applications, we are able to
completely understand their fixed points, providing a foundation to the much
needed connection between application and theory of deep neural networks.
( 2
min )
The successes of foundation models such as ChatGPT and AlphaFold have spurred
significant interest in building similar models for electronic medical records
(EMRs) to improve patient care and hospital operations. However, recent hype
has obscured critical gaps in our understanding of these models' capabilities.
We review over 80 foundation models trained on non-imaging EMR data (i.e.
clinical text and/or structured data) and create a taxonomy delineating their
architectures, training data, and potential use cases. We find that most models
are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or
broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that
do not provide meaningful insights on their usefulness to health systems. In
light of these findings, we propose an improved evaluation framework for
measuring the benefits of clinical foundation models that is more closely
grounded to metrics that matter in healthcare.
( 2
min )
The feature matching is a basic step in matching different datasets. This
article proposes shows a new hybrid model of a pretrained Natural Language
Processing (NLP) based model called BERT used in parallel with a statistical
model based on Jaccard similarity to measure the similarity between list of
features from two different datasets. This reduces the time required to search
for correlations or manually match each feature from one dataset to another.
( 2
min )
In this paper, we propose an algorithmic framework to automatically generate
efficient deep neural networks and optimize their associated hyperparameters.
The framework is based on evolving directed acyclic graphs (DAGs), defining a
more flexible search space than the existing ones in the literature. It allows
mixtures of different classical operations: convolutions, recurrences and dense
layers, but also more newfangled operations such as self-attention. Based on
this search space we propose neighbourhood and evolution search operators to
optimize both the architecture and hyper-parameters of our networks. These
search operators can be used with any metaheuristic capable of handling mixed
search spaces. We tested our algorithmic framework with an evolutionary
algorithm on a time series prediction benchmark. The results demonstrate that
our framework was able to find models outperforming the established baseline on
numerous datasets.
( 2
min )
The Weisfeiler--Lehman (WL) test is a fundamental iterative algorithm for
checking isomorphism of graphs. It has also been observed that it underlies the
design of several graph neural network architectures, whose capabilities and
performance can be understood in terms of the expressive power of this test.
Motivated by recent developments in machine learning applications to datasets
involving three-dimensional objects, we study when the WL test is {\em
complete} for clouds of euclidean points represented by complete distance
graphs, i.e., when it can distinguish, up to isometry, any arbitrary such
cloud.
Our main result states that the $(d-1)$-dimensional WL test is complete for
point clouds in $d$-dimensional Euclidean space, for any $d\ge 2$, and that
only three iterations of the test suffice. Our result is tight for $d = 2, 3$.
We also observe that the $d$-dimensional WL test only requires one iteration to
achieve completeness.
( 2
min )
Emerging AI applications such as ChatGPT, graph convolutional networks, and
other deep neural networks require massive computational resources for training
and inference. Contemporary computing platforms such as CPUs, GPUs, and TPUs
are struggling to keep up with the demands of these AI applications.
Non-coherent optical computing represents a promising approach for light-speed
acceleration of AI workloads. In this paper, we show how cross-layer design can
overcome challenges in non-coherent optical computing platforms. We describe
approaches for optical device engineering, tuning circuit enhancements, and
architectural innovations to adapt optical computing to a variety of AI
workloads. We also discuss techniques for hardware/software co-design that can
intelligently map and adapt AI software to improve its performance on
non-coherent optical computing platforms.
( 2
min )
We study a class of interacting particle systems for implementing a marginal
maximum likelihood estimation (MLE) procedure to optimize over the parameters
of a latent variable model. To do so, we propose a continuous-time interacting
particle system which can be seen as a Langevin diffusion over an extended
state space, where the number of particles acts as the inverse temperature
parameter in classical settings for optimisation. Using Langevin diffusions, we
prove nonasymptotic concentration bounds for the optimisation error of the
maximum marginal likelihood estimator in terms of the number of particles in
the particle system, the number of iterations of the algorithm, and the
step-size parameter for the time discretisation analysis.
( 2
min )
In this paper, we introduce a new class of functions on $\mathbb{R}$ that is
closed under composition, and contains the logistic sigmoid function. We use
this class to show that any 1-dimensional neural network of arbitrary depth
with logistic sigmoid activation functions has at most three fixed points.
While such neural networks are far from real world applications, we are able to
completely understand their fixed points, providing a foundation to the much
needed connection between application and theory of deep neural networks.
( 2
min )
Evaluating the performance of human is a common need across many
applications, such as in engineering and sports. When evaluating human
performance in completing complex and interactive tasks, the most common way is
to use a metric having been proved efficient for that context, or to use
subjective measurement techniques. However, this can be an error prone and
unreliable process since static metrics cannot capture all the complex contexts
associated with such tasks and biases exist in subjective measurement. The
objective of our research is to create data-driven AI agents as computational
benchmarks to evaluate human performance in solving difficult tasks involving
multiple humans and contextual factors. We demonstrate this within the context
of football performance analysis. We train a generative model based on
Conditional Variational Recurrent Neural Network (VRNN) Model on a large player
and ball tracking dataset. The trained model is used to imitate the
interactions between two teams and predict the performance from each team. Then
the trained Conditional VRNN Model is used as a benchmark to evaluate team
performance. The experimental results on Premier League football dataset
demonstrates the usefulness of our method to existing state-of-the-art static
metric used in football analytics.
( 2
min )
Machine learning algorithms, especially Neural Networks (NNs), are a valuable
tool used to approximate non-linear relationships, like the AC-Optimal Power
Flow (AC-OPF), with considerable accuracy -- and achieving a speedup of several
orders of magnitude when deployed for use. Often in power systems literature,
the NNs are trained with a fixed dataset generated prior to the training
process. In this paper, we show that adapting the NN training dataset during
training can improve the NN performance and substantially reduce its worst-case
violations. This paper proposes an algorithm that identifies and enriches the
training dataset with critical datapoints that reduce the worst-case violations
and deliver a neural network with improved worst-case performance guarantees.
We demonstrate the performance of our algorithm in four test power systems,
ranging from 39-buses to 162-buses.
( 2
min )
In applications of offline reinforcement learning to observational data, such
as in healthcare or education, a general concern is that observed actions might
be affected by unobserved factors, inducing confounding and biasing estimates
derived under the assumption of a perfect Markov decision process (MDP) model.
Here we tackle this by considering off-policy evaluation in a partially
observed MDP (POMDP). Specifically, we consider estimating the value of a given
target policy in a POMDP given trajectories with only partial state
observations generated by a different and unknown policy that may depend on the
unobserved state. We tackle two questions: what conditions allow us to identify
the target policy value from the observed data and, given identification, how
to best estimate it. To answer these, we extend the framework of proximal
causal inference to our POMDP setting, providing a variety of settings where
identification is made possible by the existence of so-called bridge functions.
We then show how to construct semiparametrically efficient estimators in these
settings. We term the resulting framework proximal reinforcement learning
(PRL). We demonstrate the benefits of PRL in an extensive simulation study and
on the problem of sepsis management.
( 2
min )
In this paper we discuss how to evaluate the differences between fitted
logistic regression models across sub-populations. Our motivating example is in
studying computerized diagnosis for learning disabilities, where
sub-populations based on gender may or may not require separate models. In this
context, significance tests for hypotheses of no difference between populations
may provide perverse incentives, as larger variances and smaller samples
increase the probability of not-rejecting the null. We argue that equivalence
testing for a prespecified tolerance level on population differences
incentivizes accuracy in the inference. We develop a cascading set of
equivalence tests, in which each test addresses a different aspect of the
model: the way the phenomenon is coded in the regression coefficients, the
individual predictions in the per example log odds ratio and the overall
accuracy in the mean square prediction error. For each equivalence test, we
propose a strategy for setting the equivalence thresholds. The large-sample
approximations are validated using simulations. For diagnosis data, we show
examples for equivalent and non-equivalent models.
( 2
min )
Various recent experimental results show that large language models (LLM)
exhibit emergent abilities that are not present in small models. System
performance is greatly improved after passing a certain critical threshold of
scale. In this letter, we provide a simple explanation for such a phase
transition phenomenon. For this, we model an LLM as a sequence-to-sequence
random function. Instead of using instant generation at each step, we use a
list decoder that keeps a list of candidate sequences at each step and defers
the generation of the output sequence at the end. We show that there is a
critical threshold such that the expected number of erroneous candidate
sequences remains bounded when an LLM is below the threshold, and it grows
exponentially when an LLM is above the threshold. Such a threshold is related
to the basic reproduction number in a contagious disease.
( 2
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 41
min )
Artificial Intelligence (AI) is a rapidly evolving technology that is revolutionizing many industries like business, banking, manufacturing, healthcare etc.
https://dailytrendin.blogspot.com/2023/03/artificial-intelligence-is-revolutionizing-the-world-ai-in-business-banking-manufacturing-healthcare.html
submitted by /u/Elon-pictorials
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/Emilie_Tunc
[link] [comments]
( 42
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 42
min )
submitted by /u/sardoa11
[link] [comments]
( 41
min )
Listen to the podcast episode with Ben Eysenbach from CMU where we discuss about designing simpler and more principled RL algorithms!
submitted by /u/thejashGI
[link] [comments]
( 43
min )
GitHub: https://github.com/radi-cho/noisy-sentences-dataset
We have constructed our dataset to cover representatives from the language families used across Europe.
Germanic - English, German; Romance - French; Slavic - Bulgarian; Turkic - Turkish;
Use case example: Apply language models or other techniques to compare the sentence pairs and reconstruct the original sentences from the augmented ones. You can use a single multilingual solution to solve the challenge or employ multiple models/techniques for the separate languages. Per-word dictionary lookup is also an option.
submitted by /u/radi-cho
[link] [comments]
( 43
min )
Blogpost: Introducing Prompt-to-Voice - Describe It to Hear It / Blog / Coqui
There is still space for improvement, but that is an exciting take on voice creation.
I wonder if it'd be open-sourced alongside TTS.
submitted by /u/mamafied
[link] [comments]
( 43
min )
PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.
Today version 2.3 got released: https://github.com/pyg-team/pytorch_geometric/releases/tag/2.3.0
submitted by /u/Balance-
[link] [comments]
( 43
min )
submitted by /u/mrx-ai
[link] [comments]
( 43
min )
submitted by /u/FrereKhan
[link] [comments]
( 45
min )
GitHub: https://github.com/mayooear/gpt4-pdf-chatbot-langchain
Demo video: https://www.youtube.com/watch?v=ih9PBGVVOO4
submitted by /u/radi-cho
[link] [comments]
( 43
min )
New paper by MSR researchers analyzing an early (and less constrained) version of GPT-4. Spicy quote from the abstract:
"Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."
What are everyone's thoughts?
submitted by /u/SWAYYqq
[link] [comments]
( 66
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This is joint post co-written by Leidos and AWS. Leidos is a FORTUNE 500 science and technology solutions leader working to address some of the world’s toughest challenges in the defense, intelligence, homeland security, civil, and healthcare markets. Leidos has partnered with AWS to develop an approach to privacy-preserving, confidential machine learning (ML) modeling where […]
( 11
min )
We humans love it when we use any application on one device, leave some actions unfinished, and use another device logged into the same account and continue the same thing from where we left off. And that all becomes possible due to cloud computing, which has introduced us to the omni-connected systems regardless of the… Read More »Cloud Cost Optimization Best Practices to Cut Spending Without Sacrificing Performance
The post Cloud Cost Optimization Best Practices to Cut Spending Without Sacrificing Performance appeared first on Data Science Central.
( 23
min )
Gamers love games — as do the people who make them. GeForce NOW streams over 1,500 games from the cloud, and with the Game Developers Conference in full swing this week, today’s GFN Thursday celebrates all things games: the tech behind them, the tools that bring them to the cloud, the ways to play them Read article >
( 6
min )
We provide a comprehensive reply to the comment written by Stefan Boettcher
[arXiv:2210.00623] and argue that the comment singles out one particular
non-representative example problem, entirely focusing on the maximum cut
problem (MaxCut) on sparse graphs, for which greedy algorithms are expected to
perform well. Conversely, we highlight the broader algorithmic development
underlying our original work, and (within our original framework) provide
additional numerical results showing sizable improvements over our original
data, thereby refuting the comment's original performance statements.
Furthermore, it has already been shown that physics-inspired graph neural
networks (PI-GNNs) can outperform greedy algorithms, in particular on hard,
dense instances. We also argue that the internal (parallel) anatomy of graph
neural networks is very different from the (sequential) nature of greedy
algorithms, and (based on their usage at the scale of real-world social
networks) point out that graph neural networks have demonstrated their
potential for superior scalability compared to existing heuristics such as
extremal optimization. Finally, we conclude highlighting the conceptual novelty
of our work and outline some potential extensions.
( 3
min )
The constitutive behavior of polymeric materials is often modeled by finite
linear viscoelastic (FLV) or quasi-linear viscoelastic (QLV) models. These
popular models are simplifications that typically cannot accurately capture the
nonlinear viscoelastic behavior of materials. For example, the success of
attempts to capture strain rate-dependent behavior has been limited so far. To
overcome this problem, we introduce viscoelastic Constitutive Artificial Neural
Networks (vCANNs), a novel physics-informed machine learning framework for
anisotropic nonlinear viscoelasticity at finite strains. vCANNs rely on the
concept of generalized Maxwell models enhanced with nonlinear strain
(rate)-dependent properties represented by neural networks. The flexibility of
vCANNs enables them to automatically identify accurate and sparse constitutive
models of a broad range of materials. To test vCANNs, we trained them on
stress-strain data from Polyvinyl Butyral, the electro-active polymers VHB 4910
and 4905, and a biological tissue, the rectus abdominis muscle. Different
loading conditions were considered, including relaxation tests, cyclic
tension-compression tests, and blast loads. We demonstrate that vCANNs can
learn to capture the behavior of all these materials accurately and
computationally efficiently without human guidance.
( 2
min )
We propose learning a depth covariance function with applications to
geometric vision tasks. Given RGB images as input, the covariance function can
be flexibly used to define priors over depth functions, predictive
distributions given observations, and methods for active point selection. We
leverage these techniques for a selection of downstream tasks: depth
completion, bundle adjustment, and monocular dense visual odometry.
( 2
min )
While end-to-end learning systems are rapidly gaining capabilities and
popularity, the increasing computational demands for deploying such systems,
along with a lack of flexibility, adaptability, explainability, reasoning and
verification capabilities, require new types of architectures. Here we
introduce a classification of hybrid systems which, based on an analysis of
human knowledge and intelligence, combines neural learning with various types
of knowledge and knowledge sources. We present the Thrill-K architecture as a
prototypical solution for integrating instantaneous knowledge, standby
knowledge and external knowledge sources in a framework capable of inference,
learning and intelligent control.
( 2
min )
This paper investigates the universal approximation capabilities of
Hamiltonian Deep Neural Networks (HDNNs) that arise from the discretization of
Hamiltonian Neural Ordinary Differential Equations. Recently, it has been shown
that HDNNs enjoy, by design, non-vanishing gradients, which provide numerical
stability during training. However, although HDNNs have demonstrated
state-of-the-art performance in several applications, a comprehensive study to
quantify their expressivity is missing. In this regard, we provide a universal
approximation theorem for HDNNs and prove that a portion of the flow of HDNNs
can approximate arbitrary well any continuous function over a compact domain.
This result provides a solid theoretical foundation for the practical use of
HDNNs.
( 2
min )
Head MRI pre-processing involves converting raw images to an
intensity-normalized, skull-stripped brain in a standard coordinate space. In
this paper, we propose an end-to-end weakly supervised learning approach,
called Neural Pre-processing (NPP), for solving all three sub-tasks
simultaneously via a neural network, trained on a large dataset without
individual sub-task supervision. Because the overall objective is highly
under-constrained, we explicitly disentangle geometric-preserving intensity
mapping (skull-stripping and intensity normalization) and spatial
transformation (spatial normalization). Quantitative results show that our
model outperforms state-of-the-art methods which tackle only a single sub-task.
Our ablation experiments demonstrate the importance of the architecture design
we chose for NPP. Furthermore, NPP affords the user the flexibility to control
each of these tasks at inference time. The code and model are freely-available
at \url{https://github.com/Novestars/Neural-Pre-processing}.
( 2
min )
Ensembles based on k nearest neighbours (kNN) combine a large number of base
learners, each constructed on a sample taken from a given training data.
Typical kNN based ensembles determine the k closest observations in the
training data bounded to a test sample point by a spherical region to predict
its class. In this paper, a novel random projection extended neighbourhood rule
(RPExNRule) ensemble is proposed where bootstrap samples from the given
training data are randomly projected into lower dimensions for additional
randomness in the base models and to preserve features information. It uses the
extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly
projected bootstrap samples.
( 2
min )
Hello there!
Serge chat UI, with conversations on the left
I've recently been working on Serge, a self-hosted dockerized way of running LLaMa models with a decent UI & stored conversations. It currently supports Alpaca 7B, 13B and 30B and we're working on integrating it with LangChain and the ReAct chain agent.
I've tried my best at making the instructions dead easy, so it's all dockerized with a download manager for weights and it can be run with almost zero configuration required.
I think being able to run those models locally will be key to expanding their ability, and so I hope this can contribute to that.
Let me know if you have any feedback or suggestions on how to extend its capabilities!
GitHub: https://github.com/nsarrazin/serge
submitted by /u/SensitiveCranberry
[link] [comments]
( 47
min )
submitted by /u/Chipdoc
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/TheSlickGecko
[link] [comments]
( 41
min )
submitted by /u/GamesAndGlasses
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Arun4033622
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 41
min )
submitted by /u/Emilie_Tunc
[link] [comments]
( 41
min )
submitted by /u/abstractcontrol
[link] [comments]
( 41
min )
Game developer CD PROJEKT RED today at the Game Developers Conference in San Francisco unveiled a technology preview for Cyberpunk 2077 with path tracing, coming April 11. Path tracing, also known as full ray tracing, accurately simulates light throughout an entire scene. It’s used by visual effects artists to create film and TV graphics that Read article >
( 5
min )
Gamers wanted better graphics. GPUs delivered. Those GPUs became the key to the world-changing AI revolution. Now gamers are reaping the benefits. At GDC 2023 in San Francisco this week, the gaming industry’s premier developers conference, NVIDIA made a series of announcements, including new games and game development tools that promise to accelerate innovations at Read article >
( 6
min )
Like old friends catching up over coffee, two industry icons reflected on how modern AI got its start, where it’s at today and where it needs to go next. Jensen Huang, founder and CEO of NVIDIA, interviewed AI pioneer Ilya Sutskever in a fireside chat at GTC. The talk was recorded a day after the Read article >
( 6
min )
Building AI applications is hard. Putting them to use across a business can be even harder. Less than one-third of enterprises that have begun adopting AI actually have it in production, according to a recent IDC survey. Businesses often realize the full complexity of operationalizing AI just prior to launching an application. Problems discovered so Read article >
( 7
min )
With Amazon Rekognition Custom Labels, you can have Amazon Rekognition train a custom model for object detection or image classification specific to your business needs. For example, Rekognition Custom Labels can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected […]
( 7
min )
There has been a paradigm change in the mindshare of education customers who are now willing to explore new technologies and analytics. Universities and other higher learning institutions have collected massive amounts of data over the years, and now they are exploring options to use that data for deeper insights and better educational outcomes. You […]
( 7
min )
In this post, we show how to configure a new OAuth-based authentication feature for using Snowflake in Amazon SageMaker Data Wrangler. Snowflake is a cloud data platform that provides data solutions for data warehousing to data science. Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and […]
( 12
min )
Announcements GPT-4: Chatbots and Data Prep Ain’t What They Used To Be With the recent launch of OpenAI’s GPT-4, Google Bard and Anthropic’s Claude, reporters on the AI beat this week got to compare and contrast three prominent large language model (LLM) chatbot approaches. Their initial conclusion seems to be that all three have comparable… Read More »DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be
The post DSC Weekly 22 March 2023 – GPT-4: Chatbots and Data Prep Ain’t What They Used To Be appeared first on Data Science Central.
( 20
min )
Deductive domains are typical of many cognitive skills in that no single
problem-solving strategy is always optimal for solving all problems. It was
shown that students who know how and when to use each strategy (StrTime)
outperformed those who know neither and stick to the default strategy
(Default). In this work, students were trained on a logic tutor that supports a
default forward-chaining and a backward-chaining (BC) strategy, then a
probability tutor that only supports BC. We investigated three types of
interventions on teaching the Default students how and when to use which
strategy on the logic tutor: Example, Nudge and Presented. Meanwhile, StrTime
students received no interventions. Overall, our results show that Nudge
outperformed their Default peers and caught up with StrTime on both tutors.
( 2
min )
Grokking is a phenomenon where a model trained on an algorithmic task first
overfits but, then, after a large amount of additional training, undergoes a
phase transition to generalize perfectly. We empirically study the internal
structure of networks undergoing grokking on the sparse parity task, and find
that the grokking phase transition corresponds to the emergence of a sparse
subnetwork that dominates model predictions. On an optimization level, we find
that this subnetwork arises when a small subset of neurons undergoes rapid norm
growth, whereas the other neurons in the network decay slowly in norm. Thus, we
suggest that the grokking phase transition can be understood to emerge from
competition of two largely distinct subnetworks: a dense one that dominates
before the transition and generalizes poorly, and a sparse one that dominates
afterwards.
( 2
min )
Due to the rapid dynamics and a mass of uncertainties in the quantitative
markets, the issue of how to take appropriate actions to make profits in stock
trading remains a challenging one. Reinforcement learning (RL), as a
reward-oriented approach for optimal control, has emerged as a promising method
to tackle this strategic decision-making problem in such a complex financial
scenario. In this paper, we integrated two prior financial trading strategies
named constant proportion portfolio insurance (CPPI) and time-invariant
portfolio protection (TIPP) into multi-agent deep deterministic policy gradient
(MADDPG) and proposed two specifically designed multi-agent RL (MARL) methods:
CPPI-MADDPG and TIPP-MADDPG for investigating strategic trading in quantitative
markets. Afterward, we selected 100 different shares in the real financial
market to test these specifically proposed approaches. The experiment results
show that CPPI-MADDPG and TIPP-MADDPG approaches generally outperform the
conventional ones.
( 2
min )
This work introduces the notion of intermediate concepts based on levels
structure to aid explainability for black-box models. The levels structure is a
hierarchical structure in which each level corresponds to features of a dataset
(i.e., a player-set partition). The level of coarseness increases from the
trivial set, which only comprises singletons, to the set, which only contains
the grand coalition. In addition, it is possible to establish meronomies, i.e.,
part-whole relationships, via a domain expert that can be utilised to generate
explanations at an abstract level. We illustrate the usability of this approach
in a real-world car model example and the Titanic dataset, where intermediate
concepts aid in explainability at different levels of abstraction.
( 2
min )
In this work we present a deep learning approach to conduct hypothesis-free,
transcriptomics-based matching of drugs for diseases. Our proposed neural
network architecture is trained on approved drug-disease indications, taking as
input the relevant disease and drug differential gene expression profiles, and
learns to identify novel indications. We assemble an evaluation dataset of
disease-drug indications spanning 68 diseases and evaluate in silico our
approach against the most widely used transcriptomics-based matching baselines,
CMap and the Characteristic Direction. Our results show a more than 200%
improvement over both baselines in terms of standard retrieval metrics. We
further showcase our model's ability to capture different genes' expressions
interactions among drugs and diseases. We provide our trained models, data and
code to predict with them at https://github.com/healx/dgem-nn-public.
( 2
min )
Using deep learning methods to classify EEG signals can accurately identify
people's emotions. However, existing studies have rarely considered the
application of the information in another domain's representations to feature
selection in the time-frequency domain. We propose a classification network of
EEG signals based on the cross-domain feature fusion method, which makes the
network more focused on the features most related to brain activities and
thinking changes by using the multi-domain attention mechanism. In addition, we
propose a two-step fusion method and apply these methods to the EEG emotion
recognition network. Experimental results show that our proposed network, which
combines multiple representations in the time-frequency domain and spatial
domain, outperforms previous methods on public datasets and achieves
state-of-the-art at present.
( 2
min )
Biological neural networks continue to inspire breakthroughs in neural
network performance. And yet, one key area of neural computation that has been
under-appreciated and under-investigated is biologically plausible,
energy-efficient spiking neural networks, whose potential is especially
attractive for low-power, mobile, or otherwise hardware-constrained settings.
We present a literature review of recent developments in the interpretation,
optimization, efficiency, and accuracy of spiking neural networks. Key
contributions include identification, discussion, and comparison of
cutting-edge methods in spiking neural network optimization, energy-efficiency,
and evaluation, starting from first principles so as to be accessible to new
practitioners.
( 2
min )
Continual learning is a problem for artificial neural networks that their
biological counterparts are adept at solving. Building on work using Sparse
Distributed Memory (SDM) to connect a core neural circuit with the powerful
Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is
a strong continual learner. We find that every component of our MLP variant
translated from biology is necessary for continual learning. Our solution is
also free from any memory replay or task information, and introduces novel
methods to train sparse networks that may be broadly applicable.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because ensuring iterates remain on the submanifold often requires
solving difficult differential equations. We simplify such optimization
algorithms for the submanifold of symmetric positive-definite matrices with the
affine invariant metric. We propose a generalized version of the Riemannian
normal coordinates which dynamically trivializes the problem into a Euclidean
unconstrained problem. We use our approach to explain and simplify existing
approaches for structured covariances and develop efficient second-order
optimizers for deep learning without explicit matrix inverses.
( 2
min )
Federated Learning (FL) is a collaborative machine learning (ML) framework
that combines on-device training and server-based aggregation to train a common
ML model among distributed agents. In this work, we propose an asynchronous FL
design with periodic aggregation to tackle the straggler issue in FL systems.
Considering limited wireless communication resources, we investigate the effect
of different scheduling policies and aggregation designs on the convergence
performance. Driven by the importance of reducing the bias and variance of the
aggregated model updates, we propose a scheduling policy that jointly considers
the channel quality and training data representation of user devices. The
effectiveness of our channel-aware data-importance-based scheduling policy,
compared with state-of-the-art methods proposed for synchronous FL, is
validated through simulations. Moreover, we show that an ``age-aware''
aggregation weighting design can significantly improve the learning performance
in an asynchronous FL setting.
( 2
min )
This article presents the DeepSense 6G dataset, which is a large-scale
dataset based on real-world measurements of co-existing multi-modal sensing and
communication data. The DeepSense 6G dataset is built to advance deep learning
research in a wide range of applications in the intersection of multi-modal
sensing, communication, and positioning. This article provides a detailed
overview of the DeepSense dataset structure, adopted testbeds, data collection
and processing methodology, deployment scenarios, and example applications,
with the objective of facilitating the adoption and reproducibility of
multi-modal sensing and communication datasets.
( 2
min )
Building accurate Deep Learning (DL) models for brain age prediction is a
very relevant topic in neuroimaging, as it could help better understand
neurodegenerative disorders and find new biomarkers. To estimate accurate and
generalizable models, large datasets have been collected, which are often
multi-site and multi-scanner. This large heterogeneity negatively affects the
generalization performance of DL models since they are prone to overfit
site-related noise. Recently, contrastive learning approaches have been shown
to be more robust against noise in data or labels. For this reason, we propose
a novel contrastive learning regression loss for robust brain age prediction
using MRI scans. Our method achieves state-of-the-art performance on the
OpenBHB challenge, yielding the best generalization capability and robustness
to site-related noise.
( 2
min )
In model-based reinforcement learning for safety-critical control systems, it
is important to formally certify system properties (e.g., safety, stability)
under the learned controller. However, as existing methods typically apply
formal verification \emph{after} the controller has been learned, it is
sometimes difficult to obtain any certificate, even after many iterations
between learning and verification. To address this challenge, we propose a
framework that jointly conducts reinforcement learning and formal verification
by formulating and solving a novel bilevel optimization problem, which is
differentiable by the gradients from the value function and certificates.
Experiments on a variety of examples demonstrate the significant advantages of
our framework over the model-based stochastic value gradient (SVG) method and
the model-free proximal policy optimization (PPO) method in finding feasible
controllers with barrier functions and Lyapunov functions that ensure system
safety and stability.
( 2
min )
In this paper, we present a new approach to mental state classification from
EEG signals by combining signal processing techniques and machine learning (ML)
algorithms. We evaluate the performance of the proposed method on a dataset of
EEG recordings collected during a cognitive load task and compared it to other
state-of-the-art methods. The results show that the proposed method achieves
high accuracy in classifying mental states and outperforms state-of-the-art
methods in terms of classification accuracy and computational efficiency.
( 2
min )
Randomized neural networks (randomized NNs), where only the terminal layer's
weights are optimized constitute a powerful model class to reduce computational
time in training the neural network model. At the same time, these models
generalize surprisingly well in various regression and classification tasks. In
this paper, we give an exact macroscopic characterization (i.e., a
characterization in function space) of the generalization behavior of
randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs
correspond to a generalized additive model (GAM)-typed regression in which
infinitely many directions are considered: the infinite generalized additive
model (IGAM). The IGAM is formalized as solution to an optimization problem in
function space for a specific regularization functional and a fairly general
loss. This work is an extension to multivariate NNs of prior work, where we
showed how wide RSNs with ReLU activation behave like spline regression under
certain conditions and if the input is one-dimensional.
( 2
min )
Our study focuses on determining the best weight windows for a weighted
moving average smoother under squared loss. We show that there exists an
optimal weight window that is symmetrical around its center. We study the class
of tapered weight windows, which decrease in weight as they move away from the
center. We formulate the corresponding least squares problem as a quadratic
program and finally as a projection of the origin onto a convex polytope.
Additionally, we provide some analytical solutions to the best window when some
conditions are met on the input data.
( 2
min )
This paper extends standard results from learning theory with independent
data to sequences of dependent data. Contrary to most of the literature, we do
not rely on mixing arguments or sequential measures of complexity and derive
uniform risk bounds with classical proof patterns and capacity measures. In
particular, we show that the standard classification risk bounds based on the
VC-dimension hold in the exact same form for dependent data, and further
provide Rademacher complexity-based bounds, that remain unchanged compared to
the standard results for the identically and independently distributed case.
Finally, we show how to apply these results in the context of scenario-based
optimization in order to compute the sample complexity of random programs with
dependent constraints.
( 2
min )
When considering a real log canonical threshold (RLCT) that gives a Bayesian
generalization error, in general, papers replace a mean error function with a
relatively simple polynomial whose RLCT corresponds to that of the mean error
function, and obtain its RLCT by resolving its singularities through an
algebraic operation called blow-up. Though it is known that the singularities
of any polynomial can be resolved by a finite number of blow-up iterations, it
is not clarified whether or not it is possible to resolve singularities of a
specific polynomial by applying a specific blow-up algorithm. Therefore this
paper considers the blow-up algorithm for the polynomials called
sum-of-products (sop) polynomials and its RLCT.
( 2
min )
We consider a variant of contextual bandits in which the algorithm consumes
multiple resources subject to linear constraints on total consumption. This
problem generalizes contextual bandits with knapsacks (CBwK), allowing for
packing and covering constraints, as well as positive and negative resource
consumption. We present a new algorithm that is simple, computationally
efficient, and admits vanishing regret. It is statistically optimal for CBwK
when an algorithm must stop once some constraint is violated. Our algorithm
builds on LagrangeBwK (Immorlica et al., FOCS 2019) , a Lagrangian-based
technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a
regression-based technique for contextual bandits. Our analysis leverages the
inherent modularity of both techniques.
( 2
min )
Lattice gauge equivariant convolutional neural networks (L-CNNs) are a
framework for convolutional neural networks that can be applied to non-Abelian
lattice gauge theories without violating gauge symmetry. We demonstrate how
L-CNNs can be equipped with global group equivariance. This allows us to extend
the formulation to be equivariant not just under translations but under global
lattice symmetries such as rotations and reflections. Additionally, we provide
a geometric formulation of L-CNNs and show how convolutions in L-CNNs arise as
a special case of gauge equivariant neural networks on SU($N$) principal
bundles.
( 2
min )
There are many open-source projects and indie-built demos around the GPT-4 API. Despite the recent shift of OpenAI toward closure, open demos are always advancing the field and inspiring creativity. Here are some community projects that I find particularly interesting: https://github.com/radi-cho/awesome-gpt4. Feel free to share the things you've been building or something you've been fascinated about on social media either by joining the discussion here or by contributing to the repository:)
submitted by /u/radi-cho
[link] [comments]
( 43
min )
https://medium.com/coiled-computing/save-money-with-spot-d499edd46ae7
submitted by /u/dask-jeeves
[link] [comments]
( 43
min )
audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training, and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) and ASR etc.
Source Code: https://github.com/libAudioFlux/audioFlux
submitted by /u/Leo_D517
[link] [comments]
( 47
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/manhesh
[link] [comments]
( 41
min )
submitted by /u/WaffleHouseBaby
[link] [comments]
( 49
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/Powamotogang
[link] [comments]
( 41
min )
submitted by /u/RideFuture
[link] [comments]
( 41
min )
submitted by /u/Ok-Craft-9908
[link] [comments]
( 42
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/GaylordTurner
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/sharkymcstevenson2
[link] [comments]
( 43
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 42
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 41
min )
Llama + Alpaca-13b + 64 COARS | ./Release/chat -t 120 -m ggml-alpaca-13b-q4 - YouTube
Alpaca.cpp demo https://github.com/antimatter15/alpaca.cpp
submitted by /u/APUsilicon
[link] [comments]
( 42
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
I have been working on a very interesting project that aims to create an ensemble of models for a range of tasks in the Meta-ML domain. As someone who has had limited exposure to others interested in AI and has recently started exploring the field, I've made considerable progress on my own. However, I'm reaching out to find people with diverse perspectives and backgrounds who might be interested in joining me.
The project involves designing models, developing workflows, and identifying data sources. Despite my relatively short time in the AI realm, I've come up with some novel approaches, such as custom hyperparameter tuning systems and convolutional layering methods, that I believe will help improve the models' ability to learn relationships in clean data while also allowing them to func…
( 45
min )
Deforestation is a major concern in many tropical geographies where local rainforests are at severe risk of destruction. About 17% of the Amazon rainforest has been destroyed over the past 50 years, and some tropical ecosystems are approaching a tipping point beyond which recovery is unlikely. A key driver for deforestation is raw material extraction […]
( 11
min )
Amazon SageMaker customers can view and manage their quota limits through Service Quotas. In addition, they can view near real-time utilization metrics and create Amazon CloudWatch metrics to view and programmatically query SageMaker quotas. SageMaker helps you build, train, and deploy machine learning (ML) models with ease. To learn more, refer to Getting started with […]
( 5
min )
As organizations grow in size and scale, the complexities of running workloads increase, and the need to develop and operationalize processes and workflows becomes critical. Therefore, organizations have adopted technology best practices, including microservice architecture, MLOps, DevOps, and more, to improve delivery time, reduce defects, and increase employee productivity. This post introduces a best practice […]
( 12
min )
Companies across industries are looking to use interactive avatars to enhance digital experiences. But creating them is a complex, time-consuming process requiring state-of-the-art AI models that can see, hear, understand and communicate with end users. To ease this process, NVIDIA is providing creators and developers with real-time AI solutions through Omniverse Avatar Cloud Engine (ACE), Read article >
( 5
min )
With AI at its tipping point, AI-enabled computer vision is being used to address the world’s most challenging problems in nearly every industry. At GTC, a global conference for the era of AI and the metaverse running through Thursday, March 23, NVIDIA announced technology updates poised to drive the next wave of vision AI adoption. Read article >
( 6
min )
Powerful AI technologies are revolutionizing 3D content creation — whether by enlivening realistic characters that show emotion or turning simple texts into imagery. The brightest minds, artists and creators are gathering at NVIDIA GTC, a free, global conference on AI and the metaverse, taking place online through Thursday, March 23.
( 9
min )
The automotive industry is undergoing a digital revolution, driven by breakthroughs in accelerated computing, AI and the industrial metaverse. Automakers are digitalizing every phase of the product lifecycle — including concept and styling, design and engineering, software and electronics, smart factories, autonomous driving and retail — using the NVIDIA Omniverse platform and AI. Based on Read article >
( 7
min )
Transportation industry trailblazers are propelling their next-generation vehicles by building on NVIDIA DRIVE end-to-end solutions, which span the cloud to the car. The world’s best-selling new energy vehicle (NEV) brand BYD announced at NVIDIA GTC that it’s using the NVIDIA DRIVE Orin centralized compute platform to power an even wider range of vehicles within its Read article >
( 6
min )
Mitsui & Co., Ltd., one of Japan’s largest business conglomerates, is collaborating with NVIDIA on Tokyo-1 — an initiative to supercharge the nation’s pharmaceutical leaders with technology, including high-resolution molecular dynamics simulations and generative AI models for drug discovery. Announced today at the NVIDIA GTC global AI conference, the Tokyo-1 project features an NVIDIA DGX Read article >
( 7
min )
Digitalization that combines AI and simulation is redefining how industrial products are created and transforming how people interact with the digital world. To help enterprises tackle complex new workloads, NVIDIA has unveiled the third generation of its NVIDIA OVX computing system. OVX is designed to power large-scale digital twins built on NVIDIA Omniverse Enterprise, a Read article >
( 5
min )
Healthcare enterprises globally are working with NVIDIA to drive AI-accelerated solutions that are detecting diseases earlier from medical images, delivering critical insights to care teams and revolutionizing drug discovery workflows. NVIDIA Clara, a suite of software and services that powers AI healthcare solutions, is enabling this transformation industry-wide. The Clara ecosystem includes BioNeMo for drug Read article >
( 7
min )
Powerful AI technologies are making a massive impact in 3D content creation and game development. Whether creating realistic characters that show emotion or turning simple texts into imagery, AI tools are becoming fundamental to developer workflows — and this is just the start. At NVIDIA GTC and the Game Developers Conference (GDC), learn how the Read article >
( 7
min )
BMW Group is at the forefront of a key new manufacturing trend — going digital-first by using the virtual world to optimize layouts, robotics and logistics systems years before production really starts. The automaker announced today with NVIDIA at GTC that it’s expanding its use of the NVIDIA Omniverse platform for building and operating industrial Read article >
( 6
min )
Developers and creators can better realize the massive potential of generative AI, simulation and the industrial metaverse with new Omniverse Connectors and other updates to NVIDIA Omniverse, a platform for creating and operating metaverse applications. Omniverse Cloud, a platform-as-a-service unveiled today at NVIDIA GTC, equips users with a range of simulation and generative AI capabilities Read article >
( 7
min )
NVIDIA announced today at GTC that Omniverse Cloud will be hosted on Microsoft Azure, increasing access to Isaac Sim, the company’s platform for developing and managing AI-based robots. The company also said that a full lineup of Jetson Orin modules is now available, offering a performance leap for edge AI and robotics applications. “The world’s Read article >
( 6
min )
CCC Intelligent Solutions (CCC) has become the first company in the auto insurance industry to deliver an AI-powered repair estimating solution, called CCC Estimate – STP, short for straight-through processing. The Chicago-based auto-claims technology powerhouse uses AI, insurer-driven rules and CCC’s vast ecosystem to deliver repair estimates in seconds, instead of days. It’s a technological Read article >
( 6
min )
As a sports commentator for a professional lacrosse team, Grant Farhall knows the value in having the right teammates. As the chief product officer for Getty Images, a global visual-content creator and marketplace, he believes the collaboration between his company and NVIDIA is an excellent pairing for taking generative AI to the next level. The Read article >
( 5
min )
Large language models available today are incredibly knowledgeable, but act like time capsules — the information they capture is limited to the data available when they were first trained. If trained a year ago, for example, an LLM powering an enterprise’s AI chatbot won’t know about the latest products and services at the business. With Read article >
( 6
min )
The results are in, and they point to a new era in energy-efficient computing. In tests of real workloads, the NVIDIA Grace CPU Superchip scored 2x performance gains over x86 processors at the same power envelope across major data center CPU applications. That opens up a whole new set of opportunities. It means data centers Read article >
( 6
min )
Microsoft, Tencent and Baidu are adopting NVIDIA CV-CUDA for computer vision AI. NVIDIA CEO Jensen Huang highlighted work in content understanding, visual search and deep learning Tuesday as he announced the beta release for NVIDIA’s CV-CUDA — an open-source, GPU-accelerated library for computer vision at cloud scale. “Eighty percent of internet traffic is video, user-generated Read article >
( 6
min )
submitted by /u/HastyNationality
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/freshthreadshop
[link] [comments]
( 41
min )
submitted by /u/kizumada
[link] [comments]
( 41
min )
submitted by /u/justLV
[link] [comments]
( 42
min )
submitted by /u/fiishyfiishy
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/adititalksai
[link] [comments]
( 41
min )
submitted by /u/CyberKaliyugiNepali
[link] [comments]
( 44
min )
submitted by /u/smorga
[link] [comments]
( 41
min )
submitted by /u/1024cities
[link] [comments]
( 41
min )
submitted by /u/----bubba----
[link] [comments]
( 41
min )
submitted by /u/geepytee
[link] [comments]
( 41
min )
submitted by /u/Xerxos
[link] [comments]
( 41
min )
OpenAssistant bot is live on /r/ask_open_assistant. There are some limitations to the reddit bot; you can also try on the model in chat mode at https://huggingface.co/spaces/olivierdehaene/chat-llm-streaming. Model is available for free download at https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b.
Prompt it by creating a new text post (responds to text body of post), starting a comment with !OpenAssistant, or by replying directly to it.
submitted by /u/pixiegirl417
[link] [comments]
( 44
min )
How to fine-tune Facebooks 30 billion parameter LLaMa on the Alpaca data set.
Blog post: https://abuqader.substack.com/p/releasing-alpaca-30b
Weights: https://huggingface.co/baseten/alpaca-30b
submitted by /u/imgonnarelph
[link] [comments]
( 45
min )
submitted by /u/nlprogress
[link] [comments]
( 43
min )
This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users.
https://github.com/citiususc/Smarty-GPT
submitted by /u/usc-ur
[link] [comments]
( 43
min )
I have followed this YouTube tutorial that can be run with my environment, but it seems relatively basic (for context, the game in the video is quite simple while the environment I am using is like a more complex chess)
I have heard of DDQN and other improvements to DQN, but was wondering if there is anything within basic DQN (like maybe stuff to do with the network) that can be tuned to produce better results
(Thanks for taking the time to look at this)
submitted by /u/PainisPingas
[link] [comments]
( 42
min )
This is a guest post co-written with Antony Vance from Intel. Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. Running ML workloads on Amazon SageMaker running Amazon Elastic Compute […]
( 8
min )
We thought of listing down some tactics that can help you tackle digital challenges smartly. Check out the blog to know what we are talking about.
The post Tackling the Evolving Tech Landscape the Smarter Way appeared first on Data Science Central.
( 21
min )
It’s easy to think of LLMs (large language models) as just ‘hallucinating’ or mere generators of text. A glorified LSTM so to speak. While there are some limitations of LLMs (and indeed they are evolving), a far more interesting question to explore is: How can LLMs be used in enterprise applications? In many ways, enterprise… Read More »Enterprise use cases for GPT-3: How to chat with your own data
The post Enterprise use cases for GPT-3: How to chat with your own data appeared first on Data Science Central.
( 19
min )
AI technologies like ChatGPT are necessitating a fundamental overhaul of our educational systems and institutions. Getting the right answers to predetermined tests is no longer sufficient in an age where AI can access, integrate, and recite knowledge billions if not trillions of times faster than the human mind. So, what are the skills, capabilities, and… Read More »Future of Education: Application not Regurgitation of Knowledge – Part II
The post Future of Education: Application not Regurgitation of Knowledge – Part II appeared first on Data Science Central.
( 23
min )
The E-commerce industry has been at the forefront of the transformation in the era of technology — it has reshaped everything from how customers shop to how the whole e-commerce things operate. Technology has led to significant changes in the e-commerce and retail industries over the past few years. Consumers now have more access to… Read More »E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry
The post E-commerce in 2023 — Top 5 Tech Trends that will Reshape the Industry appeared first on Data Science Central.
( 26
min )
This is a simple wrapper that introduces any imaginable complex context to each question submitted to Open AI API. The main goal is to enhance the accuracy obtained in its answers in a TRANSPARENT way to end users.
https://github.com/citiususc/Smarty-GPT
submitted by /u/usc-ur
[link] [comments]
( 41
min )
We propose a certainty-equivalence scheme for adaptive control of scalar
linear systems subject to additive, i.i.d. Gaussian disturbances and bounded
control input constraints, without requiring prior knowledge of the bounds of
the system parameters, nor the control direction. Assuming that the system is
at-worst marginally stable, mean square boundedness of the closed-loop system
states is proven. Lastly, numerical examples are presented to illustrate our
results.
( 2
min )
Great success has been achieved in the 6-DoF grasp learning from the point
cloud input, yet the computational cost due to the point set orderlessness
remains a concern. Alternatively, we explore the grasp generation from the
RGB-D input in this paper. The proposed solution, Keypoint-GraspNet, detects
the projection of the gripper keypoints in the image space and then recover the
SE(3) poses with a PnP algorithm. A synthetic dataset based on the primitive
shape and the grasp family is constructed to examine our idea. Metric-based
evaluation reveals that our method outperforms the baselines in terms of the
grasp proposal accuracy, diversity, and the time cost. Finally, robot
experiments show high success rate, demonstrating the potential of the idea in
the real-world applications.
( 2
min )
We revisit the standard formulation of tabular actor-critic algorithm as a
two time-scale stochastic approximation with value function computed on a
faster time-scale and policy computed on a slower time-scale. This emulates
policy iteration. We begin by observing that reversal of the time scales will
in fact emulate value iteration and is a legitimate algorithm. We provide a
proof of convergence and compare the two empirically with and without function
approximation (with both linear and nonlinear function approximators) and
observe that our proposed critic-actor algorithm performs on par with
actor-critic in terms of both accuracy and computational effort.
( 2
min )
We examine the problem of regret minimization when the learner is involved in
a continuous game with other optimizing agents: in this case, if all players
follow a no-regret algorithm, it is possible to achieve significantly lower
regret relative to fully adversarial environments. We study this problem in the
context of variationally stable games (a class of continuous games which
includes all convex-concave and monotone games), and when the players only have
access to noisy estimates of their individual payoff gradients. If the noise is
additive, the game-theoretic and purely adversarial settings enjoy similar
regret guarantees; however, if the noise is multiplicative, we show that the
learners can, in fact, achieve constant regret. We achieve this faster rate via
an optimistic gradient scheme with learning rate separation -- that is, the
method's extrapolation and update steps are tuned to different schedules,
depending on the noise profile. Subsequently, to eliminate the need for
delicate hyperparameter tuning, we propose a fully adaptive method that attains
nearly the same guarantees as its non-adapted counterpart, while operating
without knowledge of either the game or of the noise profile.
( 3
min )
Alzheimer's Disease (AD) is a progressive neurodegenerative disease and the
leading cause of dementia. Early diagnosis is critical for patients to benefit
from potential intervention and treatment. The retina has been hypothesized as
a diagnostic site for AD detection owing to its anatomical connection with the
brain. Developed AI models for this purpose have yet to provide a rational
explanation about the decision and neither infer the stage of disease's
progression. Along this direction, we propose a novel model-agnostic
explainable-AI framework, called Granular Neuron-level Explainer (LAVA), an
interpretation prototype that probes into intermediate layers of the
Convolutional Neural Network (CNN) models to assess the AD continuum directly
from the retinal imaging without longitudinal or clinical evaluation. This
method is applied to validate the retinal vasculature as a biomarker and
diagnostic modality for Alzheimer's Disease (AD) evaluation. UK Biobank
cognitive tests and vascular morphological features suggest LAVA shows strong
promise and effectiveness in identifying AD stages across the progression
continuum.
( 2
min )
Performing classification on noisy, crowdsourced image datasets can prove
challenging even for the best neural networks. Two issues which complicate the
problem on such datasets are class imbalance and ground-truth uncertainty in
labeling. The AL-ALL and AL-PUB datasets -- consisting of tightly cropped,
individual characters from images of ancient Greek papyri -- are strongly
affected by both issues. The application of ensemble modeling to such datasets
can help identify images where the ground-truth is questionable and quantify
the trustworthiness of those samples. As such, we apply stacked generalization
consisting of nearly identical ResNets with different loss functions: one
utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence
(KLD). Both networks use labels drawn from the crowdsourced consensus. For the
second network, the KLD is calculated with respect to the proposed Normalized
Distribution of Annotations (NDA). For our ensemble model, we apply a k-nearest
neighbors model to the outputs of the CXE and KLD networks. Individually, the
ResNet models have approximately 93% accuracy, while the ensemble model
achieves an accuracy of > 95%. We also perform an analysis of the Shannon
entropy of the various models' output distributions to measure classification
uncertainty. Our results suggest that entropy is useful for predicting model
misclassifications.
( 3
min )
This paper proposes an extension of regression trees by quadratic
unconstrained binary optimization (QUBO). Regression trees are very popular
prediction models that are trainable with tabular datasets, but their accuracy
is insufficient because the decision rules are too simple. The proposed method
extends the decision rules in decision trees to multi-dimensional boundaries.
Such an extension is generally unimplementable because of computational
limitations, however, the proposed method transforms the training process to
QUBO, which enables an annealing machine to solve this problem.
( 2
min )
Predictive modelling is often reduced to finding the best model that
optimizes a selected performance measure. But what if the second-best model
describes the data equally well but in a completely different way? What about
the third? Is it possible that the most effective models learn completely
different relationships in the data? Inspired by Anscombe's quartet, this paper
introduces Rashomon's quartet, a synthetic dataset for which four models from
different classes have practically identical predictive performance. However,
their visualization reveals drastically distinct ways of understanding the
correlation structure in data. The introduced simple illustrative example aims
to further facilitate visualization as a mandatory tool to compare predictive
models beyond their performance. We need to develop insightful techniques for
the explanatory analysis of model sets.
( 2
min )
Transformers achieve great performance on Visual Question Answering (VQA).
However, their systematic generalization capabilities, i.e., handling novel
combinations of known concepts, is unclear. We reveal that Neural Module
Networks (NMNs), i.e., question-specific compositions of modules that tackle a
sub-task, achieve better or similar systematic generalization performance than
the conventional Transformers, even though NMNs' modules are CNN-based. In
order to address this shortcoming of Transformers with respect to NMNs, in this
paper we investigate whether and how modularity can bring benefits to
Transformers. Namely, we introduce Transformer Module Network (TMN), a novel
NMN based on compositions of Transformer modules. TMNs achieve state-of-the-art
systematic generalization performance in three VQA datasets, improving more
than 30% over standard Transformers for novel compositions of sub-tasks. We
show that not only the module composition but also the module specialization
for each sub-task are the key of such performance gain.
( 2
min )
In this paper we provide a generalization of the concept of cohesion as
introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National
Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the
technique of partitioned local depth by distilling two key probabilistic
concepts: local relevance and support division. Earlier results are extended
within the new context, and examples of applications to revealing communities
in data with uncertainty are included.
( 2
min )
Tabular question answering (TQA) presents a challenging setting for neural
systems by requiring joint reasoning of natural language with large amounts of
semi-structured data. Unlike humans who use programmatic tools like filters to
transform data before processing, language models in TQA process tables
directly, resulting in information loss as table size increases. In this paper
we propose ToolWriter to generate query specific programs and detect when to
apply them to transform tables and align them with the TQA model's
capabilities. Focusing ToolWriter to generate row-filtering tools improves the
state-of-the-art for WikiTableQuestions and WikiSQL with the most performance
gained on long tables. By investigating headroom, our work highlights the
broader potential for programmatic tools combined with neural components to
manipulate large amounts of structured data.
( 2
min )
Many natural language processing tasks benefit from long inputs, but
processing long documents with Transformers is expensive -- not only due to
quadratic attention complexity but also from applying feedforward and
projection layers to every token. However, not all tokens are equally
important, especially for longer documents. We propose CoLT5, a long-input
Transformer model that builds on this intuition by employing conditional
computation, devoting more resources to important tokens in both feedforward
and attention layers. We show that CoLT5 achieves stronger performance than
LongT5 with much faster training and inference, achieving SOTA on the
long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably
make use of extremely long inputs, showing strong gains up to 64k input length.
( 2
min )
This paper describes our participation in the shared task of hate speech
detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our
experiments evaluate the performance of six transformer models and their
combination using 2 ensemble approaches. The best results on the training set,
in a five-fold cross validation scenario, were obtained by using the ensemble
approach based on the majority vote. The evaluation of this approach on the
test set resulted in an F1-score of 0.60 and an Accuracy of 0.86.
( 2
min )
Many imaging inverse problems$\unicode{x2014}$such as image-dependent
in-painting and dehazing$\unicode{x2014}$are challenging because their forward
models are unknown or depend on unknown latent parameters. While one can solve
such problems by training a neural network with vast quantities of paired
training data, such paired training data is often unavailable. In this paper,
we propose a generalized framework for training image reconstruction networks
when paired training data is scarce. In particular, we demonstrate the ability
of image denoising algorithms and, by extension, denoising diffusion models to
supervise network training in the absence of paired training data.
( 2
min )
This paper presents the winning system for the zero-shot Spanish framing
detection task, which also achieves competitive places in eight additional
languages. The challenge of the framing detection task lies in identifying a
set of 14 frames when only a few or zero samples are available, i.e., a
multilingual multi-label few- or zero-shot setting. Our developed solution
employs a pre-training procedure based on multilingual Transformers using a
label-aware contrastive loss function. In addition to describing the system, we
perform an embedding space analysis and ablation study to demonstrate how our
pre-training procedure supports framing detection to advance computational
framing analysis.
( 2
min )
Collective motion is an ubiquitous phenomenon in nature, inspiring engineers,
physicists and mathematicians to develop mathematical models and bio-inspired
designs. Collective motion at small to medium group sizes ($\sim$10-1000
individuals, also called the `mesoscale'), can show nontrivial features due to
stochasticity. Therefore, characterizing both the deterministic and stochastic
aspects of the dynamics is crucial in the study of mesoscale collective
phenomena. Here, we use a physics-inspired, neural-network based approach to
characterize the stochastic group dynamics of interacting individuals, through
a stochastic differential equation (SDE) that governs the collective dynamics
of the group. We apply this technique on both synthetic and real-world
datasets, and identify the deterministic and stochastic aspects of the dynamics
using drift and diffusion fields, enabling us to make novel inferences about
the nature of order in these systems.
( 2
min )
Significant advancements in type 1 diabetes treatment have been made in the
development of state-of-the-art Artificial Pancreas Systems (APS). However,
lapses currently exist in the timely treatment of unsafe blood glucose (BG)
levels, especially in the case of rebound hyperglycemia. We propose a machine
learning (ML) method for predictive BG scenario categorization that outputs
messages alerting the patient to upcoming BG trends to allow for earlier,
educated treatment. In addition to standard notifications of predicted
hypoglycemia and hyperglycemia, we introduce BG scenario-specific alert
messages and the preliminary steps toward precise basal suggestions for the
prevention of rebound hyperglycemia. Experimental evaluation on the DCLP3
clinical dataset achieves >98% accuracy and >79% precision for predicting
rebound high events for patient alerts.
( 2
min )
Projection-based model order reduction on nonlinear manifolds has been
recently proposed for problems with slowly decaying Kolmogorov n-width such as
advection-dominated ones. These methods often use neural networks for manifold
learning and showcase improved accuracy over traditional linear
subspace-reduced order models. A disadvantage of the previously proposed
methods is the potential high computational costs of training the networks on
high-fidelity solution snapshots. In this work, we propose and analyze a novel
method that overcomes this disadvantage by training a neural network only on
subsampled versions of the high-fidelity solution snapshots. This method
coupled with collocation-based hyper-reduction and Gappy-POD allows for
efficient and accurate surrogate models. We demonstrate the validity of our
approach on a 2d Burgers problem.
( 2
min )
Previously, we proposed a probabilistic data generation model represented by
an unobservable tree and a sequential updating method to calculate a posterior
distribution over a set of trees. The set is called a meta-tree. In this paper,
we propose a more efficient batch updating method.
( 2
min )
Adversarial examples are inputs to machine learning models that an attacker
has intentionally designed to confuse the model into making a mistake. Such
examples pose a serious threat to the applicability of machine-learning-based
systems, especially in life- and safety-critical domains. To address this
problem, the area of adversarial robustness investigates mechanisms behind
adversarial attacks and defenses against these attacks. This survey reviews
literature that focuses on the effects of data used by a model on the model's
adversarial robustness. It systematically identifies and summarizes the
state-of-the-art research in this area and further discusses gaps of knowledge
and promising future research directions.
( 2
min )
Multivariate networks are commonly found in real-world data-driven
applications. Uncovering and understanding the relations of interest in
multivariate networks is not a trivial task. This paper presents a visual
analytics workflow for studying multivariate networks to extract associations
between different structural and semantic characteristics of the networks
(e.g., what are the combinations of attributes largely relating to the density
of a social network?). The workflow consists of a neural-network-based learning
phase to classify the data based on the chosen input and output attributes, a
dimensionality reduction and optimization phase to produce a simplified set of
results for examination, and finally an interpreting phase conducted by the
user through an interactive visualization interface. A key part of our design
is a composite variable construction step that remodels nonlinear features
obtained by neural networks into linear features that are intuitive to
interpret. We demonstrate the capabilities of this workflow with multiple case
studies on networks derived from social media usage and also evaluate the
workflow through an expert interview.
( 2
min )
For the multivariate linear regression model with unknown covariance, the
corrected Akaike information criterion is the minimum variance unbiased
estimator of the expected Kullback--Leibler discrepancy. In this study, based
on the loss estimation framework, we show its inadmissibility as an estimator
of the Kullback--Leibler discrepancy itself, instead of the expected
Kullback--Leibler discrepancy. We provide improved estimators of the
Kullback--Leibler discrepancy that work well in reduced-rank situations and
examine their performance numerically.
( 2
min )
In this paper we provide a generalization of the concept of cohesion as
introduced recently by Berenhaut, Moore and Melvin [Proceedings of the National
Academy of Sciences, 119 (4) (2022)]. The formulation presented builds on the
technique of partitioned local depth by distilling two key probabilistic
concepts: local relevance and support division. Earlier results are extended
within the new context, and examples of applications to revealing communities
in data with uncertainty are included.
( 2
min )
The kernel-based method has been successfully applied in linear system
identification using stable kernel designs. From a Gaussian process
perspective, it automatically provides probabilistic error bounds for the
identified models from the posterior covariance, which are useful in robust and
stochastic control. However, the error bounds require knowledge of the true
hyperparameters in the kernel design and are demonstrated to be inaccurate with
estimated hyperparameters for lightly damped systems or in the presence of high
noise. In this work, we provide reliable quantification of the estimation error
when the hyperparameters are unknown. The bounds are obtained by first
constructing a high-probability set for the true hyperparameters from the
marginal likelihood function and then finding the worst-case posterior
covariance within the set. The proposed bound is proven to contain the true
model with a high probability and its validity is verified in numerical
simulation.
( 2
min )
Previously, we proposed a probabilistic data generation model represented by
an unobservable tree and a sequential updating method to calculate a posterior
distribution over a set of trees. The set is called a meta-tree. In this paper,
we propose a more efficient batch updating method.
( 2
min )
submitted by /u/fignewtgingrich
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
Hi guys just wanted to share a new app I worked on which uses chatgpt to reccomend gift ideas based on interests and remind you of birthdays. Please let me know what you think :)
https://apps.apple.com/de/app/giftgo-gift-ideas-with-ai/id1660850886?l=en
submitted by /u/SmoresDaniel
[link] [comments]
( 41
min )
submitted by /u/northernmostroasts
[link] [comments]
( 41
min )
submitted by /u/RhythmRobber
[link] [comments]
( 56
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/lostlifon
[link] [comments]
( 50
min )
submitted by /u/HolyOtherness
[link] [comments]
( 45
min )
Recently, John Carmack suggested the creation of a "canonical list of references from a leading figure," referring to a never-released reading list given to him by Ilya Sutskever.
While there may be an undue interest in that specific list, MLR is such a big field that it's difficult to know where to start. What are the major papers that are relevant to state of the art work being done in 2023? Perhaps we may crowd-source a list here?
submitted by /u/alfredr
[link] [comments]
( 44
min )
submitted by /u/actmademewannakms
[link] [comments]
( 43
min )
When deploying ML models with FastAPI we always had to write our own serialisation code for numpy.ndarray and PIL.Image. Not only have we replaced FastAPI with up to 100x faster C-level library a couple of weeks ago, but we have also recently added support for all the fancy Pythonic types on both client and server sides.
Check it out on GitHub/Unum-Cloud/UJRPC
https://preview.redd.it/3m73l6qodpoa1.png?width=1648&format=png&auto=webp&s=975d47f7f35a6a842a3454cccb24dd92e08816e0
submitted by /u/vov_or
[link] [comments]
( 43
min )
Preliminary results give credence to some of the claims made by OpenAI regarding performance gains achieved by GPT-4 across domains. Unanswered questions remain regarding training data used and possible leakage. Tools used were Langchain and the current API endpoints (chatgpt-3.5-turbo and gpt-4).
https://twitter.com/K_Hebenstreit/status/1636789765189308416
submitted by /u/N00B1ST
[link] [comments]
( 43
min )
submitted by /u/michaelthwan_ai
[link] [comments]
( 47
min )
submitted by /u/mlejva
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/AF15A
[link] [comments]
( 41
min )
Is there a single-task, multi-scene environment using continuous action spaces? Single-task and multi-scene envs are similar to gym-super-mario-bros and CoinRun in procgen. But they are all discrete action spaces. Thank you!!!!!
submitted by /u/Substantial_Lake_236
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/MysteryInc152
[link] [comments]
( 44
min )
submitted by /u/Taenk
[link] [comments]
( 43
min )
In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes.
https://youtu.be/eMqLFrk_tes
submitted by /u/aeiswhatiwant
[link] [comments]
( 41
min )
In this video, you will learn how to save your conversations with ChatGPT as PDF, PNG or JSON files. The tutorial will guide you through the simple steps to export your conversations in different formats for various purposes.
https://youtu.be/eMqLFrk_tes
submitted by /u/TheQuestionStation
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/sanya-g
[link] [comments]
( 45
min )
submitted by /u/manhesh
[link] [comments]
( 41
min )
submitted by /u/RandomDude6699
[link] [comments]
( 41
min )
submitted by /u/SudoSharma
[link] [comments]
( 41
min )
submitted by /u/FT05-biggoye
[link] [comments]
( 41
min )
submitted by /u/TheFootCrew_TFC
[link] [comments]
( 41
min )
I'm struggling with the idea of actual game state, the portion of it I use in the abstracted game state, and the Markov or memorylessnes property.
The game is called the "lizard game" from this video and has a simple 3x3 grid where the agent (a lizard) starts in the bottom left and moves about, trying to maximize rewards:
+------------------+------------------+------------------+ | crickets(1)| | | +------------------+------------------+------------------+ | | bird| | +------------------+------------------+------------------+ | lizard| | crickets(5)| +------------------+------------------+------------------+
The rewards are simple:
moving into an empty spot yields -1
moving into crickets(1) yields +2
moving into crickets(5) yields +10 and terminates the episode
moving into bird y…
( 49
min )
I could never figure this part out all these years, and now that I am doing a Youtube series on it I have to make sure I understand it before I publish the next video. Here a snippet from the paper 'Regret Minimization in Games with Incomplete Information' that I am specifically referring to. In the eq 4, the policy averaging considers the probability of the player's past actions.
This is from the 2008 paper. In future papers that do Monte Carlo CFR that term disappears and the those algorithms average the policies directly without considering the player's own path probability. Why is that?
submitted by /u/abstractcontrol
[link] [comments]
( 44
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
submitted by /u/YungMixtape2004
[link] [comments]
( 41
min )
submitted by /u/Microsis
[link] [comments]
( 41
min )
submitted by /u/redditguyjustinp
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/GamesAndGlasses
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/HamletsLastLine
[link] [comments]
( 42
min )
submitted by /u/justine01923
[link] [comments]
( 44
min )
submitted by /u/Past_Captain_9058
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Smaug117
[link] [comments]
( 41
min )
I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without any finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else):
https://preview.redd.it/fciatottq7oa1.png?width=1046&format=png&auto=webp&s=f88304a77b09e367e8b9812ba4b841e028481645
You are welcome to try it in RWKV 14B Gradio (click examples below the panel):
https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio
Tips: try "Expert Response" or "Expert Long Response" or "Expert Full Response" too.
https://preview.redd.it/qo71b85vq7oa1.png?width=2516&format=png&auto=webp&s=5d4467ba4bbc9016839760b3f3873f06c8b4bc6f
ChatRWKV v2 is now using a CUDA kernel to optimize INT8 inference (23 token/s on 3090): https://github.com/BlinkDL/ChatRWKV
Upgrade to latest code and "pip install rwkv --upgrade" to 0.5.0, and set os.environ["RWKV_CUDA_ON"] = '1' in v2/chat.py to enjoy the speed.
The inference speed (and VRAM consumption) of RWKV is independent of ctxlen, because it's an RNN (note: currently the preprocessing of a long prompt takes more VRAM but that can be optimized because we can process in chunks).
Meanwhile I find the latest RWKV-4-Pile-14B-20230313-ctx8192-test1050 model can utilize a long ctx:
https://preview.redd.it/a68dw0hzq7oa1.png?width=398&format=png&auto=webp&s=80570ccc844fa31efa1282d5b2106b9986e35b5a
submitted by /u/bo_peng
[link] [comments]
( 47
min )
submitted by /u/ABDULKADER90H
[link] [comments]
( 41
min )
Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can […]
( 9
min )
We tend to impute AI with human-like qualities. However, choosing to give your AI system a personality has its advantages and…
( 18
min )
As artificial intelligence (AI) continues to advance and become more pervasive in our daily lives, it is crucial that we consider the…
( 7
min )
Artificial Intelligence (AI) has transformed the way we live, work, and communicate, and it is now playing a significant role in the art…
( 7
min )
No content preview
( 1
min )
Image-to-image reconstruction problems with free or inexpensive metadata in
the form of class labels appear often in biological and medical image domains.
Existing text-guided or style-transfer image-to-image approaches do not
translate to datasets where additional information is provided as discrete
classes. We introduce and implement a model which combines image-to-image and
class-guided denoising diffusion probabilistic models. We train our model on a
real-world dataset of microscopy images used for drug discovery, with and
without incorporating metadata labels. By exploring the properties of
image-to-image diffusion with relevant labels, we show that class-guided
image-to-image diffusion can improve the meaningful content of the
reconstructed images and outperform the unguided model in useful downstream
tasks.
( 2
min )
Neural network approaches to approximate the ground state of quantum
hamiltonians require the numerical solution of a highly nonlinear optimization
problem. We introduce a statistical learning approach that makes the
optimization trivial by using kernel methods. Our scheme is an approximate
realization of the power method, where supervised learning is used to learn the
next step of the power iteration. We show that the ground state properties of
arbitrary gapped quantum hamiltonians can be reached with polynomial resources
under the assumption that the supervised learning is efficient. Using kernel
ridge regression, we provide numerical evidence that the learning assumption is
verified by applying our scheme to find the ground states of several
prototypical interacting many-body quantum systems, both in one and two
dimensions, showing the flexibility of our approach.
( 2
min )
Sequential decision making in the real world often requires finding a good
balance of conflicting objectives. In general, there exist a plethora of
Pareto-optimal policies that embody different patterns of compromises between
objectives, and it is technically challenging to obtain them exhaustively using
deep neural networks. In this work, we propose a novel multi-objective
reinforcement learning (MORL) algorithm that trains a single neural network via
policy gradient to approximately obtain the entire Pareto set in a single run
of training, without relying on linear scalarization of objectives. The
proposed method works in both continuous and discrete action spaces with no
design change of the policy network. Numerical experiments in benchmark
environments demonstrate the practicality and efficacy of our approach in
comparison to standard MORL baselines.
( 2
min )
Figuring out small molecule binding sites in target proteins, in the
resolution of either pocket or residue, is critical in many virtual and real
drug-discovery scenarios. Since it is not always easy to find such binding
sites based on domain knowledge or traditional methods, different deep learning
methods that predict binding sites out of protein structures have been
developed in recent years. Here we present a new such deep learning algorithm,
that significantly outperformed all state-of-the-art baselines in terms of the
both resolutions$\unicode{x2013}$pocket and residue. This good performance was
also demonstrated in a case study involving the protein human serum albumin and
its binding sites. Our algorithm included new ideas both in the model
architecture and in the training method. For the model architecture, it
incorporated SE(3)-invariant geometric self-attention layers that operate on
top of residue-level CNN outputs. This residue-level processing of the model
allowed a transfer learning between the two resolutions, which turned out to
significantly improve the binding pocket prediction. Moreover, we developed
novel augmentation method based on protein homology, which prevented our model
from over-fitting. Overall, we believe that our contribution to the literature
is twofold. First, we provided a new computational method for binding site
prediction that is relevant to real-world applications, as shown by the good
performance on different benchmarks and case study. Second, the novel ideas in
our method$\unicode{x2013}$the model architecture, transfer learning and the
homology augmentation$\unicode{x2013}$would serve as useful components in
future works.
( 3
min )
The secret’s out. Thanks to ChatGPT, everyone knows about the power of modern AI. To find out what’s coming next, tune in to NVIDIA founder and CEO Jensen Huang’s keynote address at NVIDIA GTC on Tuesday, March 21, at 8 a.m. Pacific. Huang will share his vision for the future of AI and how NVIDIA Read article >
( 4
min )
submitted by /u/OpenDILab
[link] [comments]
( 41
min )
submitted by /u/johnaldmilligan
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
Hello everyone. As i side project, I created a website that generated over 7,000 articles in one week, each with roughly 800 to 1000 words, all using the GPT 3.5 Turbo API in a fully automated manner. I created a Python script (also generated by the GPT) where I feed a list of topics, and it generates the content and automatically posts it on WordPress. In addition, I integrated the Google Images API to capture the image and also post it automatically. Currently, I can create around 10 posts per minute. And what about the cost? To generate these 7,000 posts with 7,000 images, I spent $40 so far!
So far, however, I don't know how Google or Bing will handle this AI-generated content and if it will affect SEO, but I'm here to check it out.
If you are interessed in how i did it and some videos, check my post: https://www.tigove.com/how/how-i-created-a-website-with-7000-post-with-chatgpt/
submitted by /u/maurimbr
[link] [comments]
( 42
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/MarkFulton
[link] [comments]
( 41
min )
submitted by /u/sidianmsjones
[link] [comments]
( 41
min )
https://medium.com/@wiroll/fake-news-chatbots-and-the-state-of-journalism-bf95c187e582
Basically...I (ChatGPT) wrote an op-ed with the essential hypothesis of, "let's double speeds in school zones in the name of safety" and...it got published...in a place I don't live...with no verification.
Problematic?
submitted by /u/KillBosby
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Csai
[link] [comments]
( 41
min )
submitted by /u/jaredigital62
[link] [comments]
( 48
min )
submitted by /u/DCGirl20874
[link] [comments]
( 41
min )
submitted by /u/theluk246
[link] [comments]
( 41
min )
Part 1: Understanding Zero-Shot Learning
( 12
min )
submitted by /u/MysteryInc152
[link] [comments]
( 46
min )
We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).
In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.
Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.
We are keen to hear your suggestions to improve the codebase further.
Github: https://github.com/PiotrNawrot/nanoT5
Twitter: https://twitter.com/p_nawrot/status/1636373725397520384
https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&s=68d413aa702b2160785a9f95e5cb00318fbfcdb4
submitted by /u/korec1234
[link] [comments]
( 44
min )
bloomz.cpp allows running inference of BLOOM-like models in pure C/C++ (inspired by llama.cpp). It supports all models that can be loaded with BloomForCausalLM.from_pretrained(). For example, you can achieve 16 tokens per second on a M1 Pro.
submitted by /u/hackerllama
[link] [comments]
( 43
min )
Hello! I read the following article about Microsoft laying off their AI Ethics team: https://www.cmswire.com/customer-experience/microsoft-cuts-ai-ethics-and-society-team-as-part-of-layoffs/
In your experience, what value do AI ethics teams add? Do they actually add useful insight, or do they serve more as a PR thing? I’ve heard conflicting anecdotes for each side. Is there anything you think AI ethics as a field can do to be more useful and to get more change? Thanks!
submitted by /u/namey-name-name
[link] [comments]
( 54
min )
An update is now available for NVIDIA Canvas, the free beta app that harnesses the power of AI to help artists quickly turn simple brushstrokes into realistic landscapes.
( 6
min )
Disney Dreamlight Valley is streaming from Steam and Epic Games Store on GeForce NOW starting today. It’s one of two new games this week that members can stream with beyond-fast performance using a GeForce NOW Ultimate membership. Game as if using a PC on any device — at up to 4K resolution and 120 frames Read article >
( 5
min )
Peter Ma was bored in his high school computer science class. So he decided to teach himself something new: how to use artificial intelligence to find alien life. That’s how he eventually became the lead author of a groundbreaking study published in Nature Astronomy. The study reveals how Ma and his co-authors used AI to Read article >
( 4
min )
submitted by /u/deeplearningperson
[link] [comments]
( 41
min )
Python comes across as an object-oriented high-level programming language with dynamic semantics that allows rapid application development. It has become a general-purpose programming language for a number of reasons. It is the ready pick for data science enthusiasts; who look forward to majoring in the field with the requisite essentials. Not just that, Python has… Read More »What Makes Python a Quick Pick for Data Analysis and Data Science?
The post What Makes Python a Quick Pick for Data Analysis and Data Science? appeared first on Data Science Central.
( 20
min )
This is a study on the potential widespread usage of alternative fuel
vehicles, linking them with the socio-economic status of the respective
consumers as well as the impact on the resulting air quality index. Research in
this area aims to leverage machine learning techniques in order to promote
appropriate policies for the proliferation of alternative fuel vehicles such as
electric vehicles with due justice to different population groups. Pearson
correlation coefficient is deployed in the modeling the relationships between
socio-economic data, air quality index and data on alternative fuel vehicles.
Linear regression is used to conduct predictive modeling on air quality index
as per the adoption of alternative fuel vehicles, based on socio-economic
factors. This work exemplifies artificial intelligence for social good.
( 2
min )
Moir\'e engineering in atomically thin van der Waals heterostructures creates
artificial quantum materials with designer properties. We solve the many-body
problem of interacting electrons confined to a moir\'e superlattice potential
minimum (the moir\'e atom) using a 2D fermionic neural network. We show that
strong Coulomb interactions in combination with the anisotropic moir\'e
potential lead to striking ``Wigner molecule" charge density distributions
observable with scanning tunneling microscopy.
( 2
min )
Diffusion models have become a popular approach for image generation and
reconstruction due to their numerous advantages. However, most diffusion-based
inverse problem-solving methods only deal with 2D images, and even recently
published 3D methods do not fully exploit the 3D distribution prior. To address
this, we propose a novel approach using two perpendicular pre-trained 2D
diffusion models to solve the 3D inverse problem. By modeling the 3D data
distribution as a product of 2D distributions sliced in different directions,
our method effectively addresses the curse of dimensionality. Our experimental
results demonstrate that our method is highly effective for 3D medical image
reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing
MRI, and sparse-view CT. Our method can generate high-quality voxel volumes
suitable for medical applications.
( 2
min )
Artwork recommendation is challenging because it requires understanding how
users interact with highly subjective content, the complexity of the concepts
embedded within the artwork, and the emotional and cognitive reflections they
may trigger in users. In this paper, we focus on efficiently capturing the
elements (i.e., latent semantic relationships) of visual art for personalized
recommendation. We propose and study recommender systems based on textual and
visual feature learning techniques, as well as their combinations. We then
perform a small-scale and a large-scale user-centric evaluation of the quality
of the recommendations. Our results indicate that textual features compare
favourably with visual ones, whereas a fusion of both captures the most
suitable hidden semantic relationships for artwork recommendation. Ultimately,
this paper contributes to our understanding of how to deliver content that
suitably matches the user's interests and how they are perceived.
( 2
min )
Adversarial training (AT) methods have been found to be effective against
adversarial attacks on deep neural networks. Many variants of AT have been
proposed to improve its performance. Pang et al. [1] have recently shown that
incorporating hypersphere embedding (HE) into the existing AT procedures
enhances robustness. We observe that the existing AT procedures are not
designed for the HE framework, and thus fail to adequately learn the angular
discriminative information available in the HE framework. In this paper, we
propose integrating HE into AT with regularization terms that exploit the rich
angular information available in the HE framework. Specifically, our method,
termed angular-AT, adds regularization terms to AT that explicitly enforce
weight-feature compactness and inter-class separation; all expressed in terms
of angular features. Experimental results show that angular-AT further improves
adversarial robustness.
( 2
min )
The performance of fault diagnosis systems is highly affected by data quality
in cyber-physical power systems. These systems generate massive amounts of data
that overburden the system with excessive computational costs. Another issue is
the presence of noise in recorded measurements, which prevents building a
precise decision model. Furthermore, the diagnostic model is often provided
with a mixture of redundant measurements that may deviate it from learning
normal and fault distributions. This paper presents the effect of feature
engineering on mitigating the aforementioned challenges in cyber-physical
systems. Feature selection and dimensionality reduction methods are combined
with decision models to simulate data-driven fault diagnosis in a 118-bus power
system. A comparative study is enabled accordingly to compare several advanced
techniques in both domains. Dimensionality reduction and feature selection
methods are compared both jointly and separately. Finally, experiments are
concluded, and a setting is suggested that enhances data quality for fault
diagnosis.
( 2
min )
The outbreak of the COVID-19 pandemic revealed the criticality of timely
intervention in a situation exacerbated by a shortage in medical staff and
equipment. Pain-level screening is the initial step toward identifying the
severity of patient conditions. Automatic recognition of state and feelings
help in identifying patient symptoms to take immediate adequate action and
providing a patient-centric medical plan tailored to a patient's state. In this
paper, we propose a framework for pain-level detection for deployment in the
United Arab Emirates and assess its performance using the most used approaches
in the literature. Our results show that a deployment of a pain-level deep
learning detection framework is promising in identifying the pain level
accurately.
( 2
min )
Several approximate inference methods have been proposed for deep discrete
latent variable models. However, non-parametric methods which have previously
been successfully employed for classical sparse coding models have largely been
unexplored in the context of deep models. We propose a non-parametric iterative
algorithm for learning discrete latent representations in such deep models.
Additionally, to learn scale invariant discrete features, we propose local data
scaling variables. Lastly, to encourage sparsity in our representations, we
propose a Beta-Bernoulli process prior on the latent factors. We evaluate our
spare coding model coupled with different likelihood models. We evaluate our
method across datasets with varying characteristics and compare our results to
current amortized approximate inference methods.
( 2
min )
Hall effect thrusters are one of the most versatile and popular electric
propulsion systems for space use. Industry trends towards interplanetary
missions arise advances in design development of such propulsion systems. It is
understood that correct sizing of discharge channel in Hall effect thruster
impact performance greatly. Since the complete physics model of such propulsion
system is not yet optimized for fast computations and design iterations, most
thrusters are being designed using so-called scaling laws. But this work
focuses on rather novel approach, which is outlined less frequently than
ordinary scaling design approach in literature. Using deep machine learning it
is possible to create predictive performance model, which can be used to
effortlessly get design of required hall thruster with required characteristics
using way less computational power than design from scratch and way more
flexible than usual scaling approach.
( 2
min )
Our research deals with the optimization version of the set partition
problem, where the objective is to minimize the absolute difference between the
sums of the two disjoint partitions. Although this problem is known to be
NP-hard and requires exponential time to solve, we propose a less demanding
version of this problem where the goal is to find a locally optimal solution.
In our approach, we consider the local optimality in respect to any movement of
at most two elements. To accomplish this, we developed an algorithm that can
generate a locally optimal solution in at most $O(N^2)$ time and $O(N)$ space.
Our algorithm can handle arbitrary input precisions and does not require
positive or integer inputs. Hence, it can be applied in various problem
scenarios with ease.
( 2
min )
who's applying and what are you planning to build??? https://www.axios.com/2023/03/15/mozilla-responsible-ai-challenge
submitted by /u/joodfish
[link] [comments]
( 43
min )
Here are the samples. My favourite is this one! Which one is your favourite?
These samples are the product of a transformer (encoder) model trained on only 3 hours of music. Each sample is seeded by the first four bars of a real piece of music. These are the final samples before I completely overhaul the pre-training stage. The idea is to go from about 2-hours of midi to over 500 hours. I'm very excited to see how this effects the sample quality.
If anyone in interesting in following the project. Star the GitHub and follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
Baidu will unveil its conversational AI ERNIE Bot, powered by Baidu's in-house LLMs, on March 16. The ERNIE LLM was first proposed as a language understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters.
ERNIE 1.0: https://arxiv.org/abs/1904.09223
ERNIE 2.0: https://arxiv.org/abs/1907.12412
ERNIE 3.0: https://arxiv.org/abs/2112.12731
ERNIE for text-to-image: https://arxiv.org/abs/2210.15257
ERNIE Bot live-stream on YouTube: https://www.youtube.com/watch?v=ukvEUI3x0vI
submitted by /u/kizumada
[link] [comments]
( 43
min )
submitted by /u/Hytsol
[link] [comments]
( 41
min )
submitted by /u/JaviFesser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Prunestand
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Salt-Entertainer3777
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/npsedhain
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/vjmde
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/jkterry1
[link] [comments]
( 41
min )
Hello everyone,
I'd like to show you a "working AlphaZero implementation that's simple enough to be able to understand what's going on at a quick glance, without sacrificing too much."
Link: https://github.com/scascin0/alphazero
submitted by /u/ayan0k0ji
[link] [comments]
( 41
min )
Global leader in convenient foods and beverages PepsiCo is deploying advanced machine vision technology from startup KoiReader Technologies, powered by the NVIDIA AI platform and GPUs, to improve efficiency and accuracy in its distribution process. PepsiCo has identified KoiReader’s technology as a solution to enable greater efficiency in reading warehouse labels. This AI-powered innovation helps Read article >
( 5
min )
It all started with two software engineers and a tomato farmer on a West Coast road trip. Visiting farms to survey their needs, the three hatched a plan at an apple orchard: build a highly adaptable 3D vision AI system for automating field tasks. Verdant, based in the San Francisco Bay Area, is developing AI Read article >
( 7
min )
Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of […]
( 11
min )
Hey r/MachineLearning,
We are collecting a hand-crafted curated list of awesome curated lists closely related to machine learning.
Here is the link to the Github repo: https://github.com/zhimin-z/awesome-awesome-machine-learning
Do any lists need to be included from your perspective? Please let me know, or feel free to submit a pull request.
The motivation underlying this project is that so many awesome lists regarding machine learning exist on GitHub. But, gradually, it adds a mental burden to memorize where to look for when the ML world is progressing faster and faster these days.
Thus, there the project comes, as a unification to sew together all awesome lists closely related to machine learning.
submitted by /u/happybirdie007
[link] [comments]
( 43
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/psprady
[link] [comments]
( 41
min )
submitted by /u/hottown
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
submitted by /u/Farnectarine4825
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Dalembert
[link] [comments]
( 42
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
Learn how to create mind-blowing AI art with just a few keywords! This guide will show you how to use an AI model to generate stunning digital art, step by step!
https://youtu.be/HmrqjqyxeCo
submitted by /u/TheQuestionStation
[link] [comments]
( 41
min )
submitted by /u/Repeat-or
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/messyp
[link] [comments]
( 42
min )
Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a […]
( 9
min )
When I was getting my MBA at the University of Iowa in 1981, my advisor Gary Fethke (who would later serve as University of Iowa interim president and Emeritus Professor in Business Analytics) convinced me to take a PhD class in econometrics. I think he was trying to punish me or something. I was totally… Read More »Future of Education: Application not Regurgitation of Knowledge – Part I
The post Future of Education: Application not Regurgitation of Knowledge – Part I appeared first on Data Science Central.
( 23
min )
As a part of my teaching for AI at the University of Oxford, I read a large number of books which are based on the maths of data science. Data Science and Machine Learning Mathematical and Statistical Methods is a book i recommend if you like the maths of data science. There is a pdf… Read More »Data Science and Machine Learning Mathematical and Statistical Methods
The post Data Science and Machine Learning Mathematical and Statistical Methods appeared first on Data Science Central.
( 20
min )
Announcements Our Revamped Submission Guidelines Since our migration to WordPress, we have been looking to solidify a set of guidelines for writers to look at prior to submitting that will give them a rough idea of the quality standards the editors are looking for. Many of you will be familiar with our Tips and Tricks… Read More »DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines
The post DSC Weekly 14 March 2023 – Our Revamped Submission Guidelines appeared first on Data Science Central.
( 20
min )
Paper - https://arxiv.org/abs/2303.05398
submitted by /u/MysteryInc152
[link] [comments]
( 45
min )
submitted by /u/MasterBin-IIAU
[link] [comments]
( 45
min )
Researchers used machine learning to build faster and more efficient hash functions, which are a key component of databases.
( 10
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 42
min )
This post is co-written with Mahima Agarwal, Machine Learning Engineer, and Deepak Mettem, Senior Engineering Manager, at VMware Carbon Black VMware Carbon Black is a renowned security solution offering protection against the full spectrum of modern cyberattacks. With terabytes of data generated by the product, the security analytics team focuses on building machine learning (ML) […]
( 11
min )
Amazon SageMaker Ground Truth Plus is a managed data labeling service that makes it easy to label data for machine learning (ML) applications. One common use case is semantic segmentation, which is a computer vision ML technique that involves assigning class labels to individual pixels in an image. For example, in video frames captured by […]
( 7
min )
(Image Source) Remote work has skyrocketed in the last three years. And with that comes increased productivity, happier employees, and lower overhead costs. But unfortunately, it’s not all sunshine and rainbows for companies with remote teams. Studies show that employees working from home increase the frequency of cyberattacks by 238%. And with the global average… Read More »How to Implement a Data Privacy and Protection Strategy for Remote Teams
The post How to Implement a Data Privacy and Protection Strategy for Remote Teams appeared first on Data Science Central.
( 23
min )
submitted by /u/gwern
[link] [comments]
( 41
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
With the development of hardware accelerators and their corresponding tools,
evaluations have become more affordable through fast and massively parallel
evaluations in some applications. This advancement has drastically sped up the
runtime of evolution-inspired algorithms such as Quality-Diversity
optimization, creating tremendous potential for algorithmic innovation through
scale. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD
algorithm based on Evolution Strategies (ES) designed for fast parallel
evaluations. ME-Multi-ES builds on top of the existing MAP-Elites-ES algorithm,
scaling it by maintaining multiple independent ES threads with massive
parallelization. We also introduce a new dynamic reset procedure for the
lifespan of the independent ES to autonomously maximize the improvement of the
QD population. We show experimentally that MEMES outperforms existing
gradient-based and objective-agnostic QD algorithms when compared in terms of
generations. We perform this comparison on both black-box optimization and
QD-Reinforcement Learning tasks, demonstrating the benefit of our approach
across different problems and domains. Finally, we also find that our approach
intrinsically enables optimization of fitness locally around a niche, a
phenomenon not observed in other QD algorithms.
( 2
min )
This tutorial introduces the CMA Evolution Strategy (ES), where CMA stands
for Covariance Matrix Adaptation. The CMA-ES is a stochastic, or randomized,
method for real-parameter (continuous domain) optimization of non-linear,
non-convex functions. We try to motivate and derive the algorithm from
intuitive concepts and from requirements of non-linear, non-convex search in
continuous domain.
( 2
min )
The use of unlicensed spectrum for cellular systems to mitigate spectrum
scarcity has led to the development of intelligent adaptive approaches to
spectrum access that improve upon traditional carrier sensing and
listen-before-talk methods. We study decentralized contention-based medium
access for base stations (BSs) of a single Radio Access Technology (RAT)
operating on unlicensed shared spectrum. We devise a distributed deep
reinforcement learning-based algorithm for both contention and adaptive
modulation, modelled on a two state Markov decision process, that attempts to
maximize a network-wide downlink throughput objective. Empirically, we find the
(proportional fairness) reward accumulated by a policy gradient approach to be
significantly higher than even a genie-aided adaptive energy detection
threshold. Our approaches are further validated by improved sum and peak
throughput. The scalability of our approach to large networks is demonstrated
via an improved cumulative reward earned on both indoor and outdoor layouts
with a large number of BSs.
( 2
min )
It is common to utilise dynamic models to measure the tyre-road friction in
real-time. Alternatively, predictive approaches estimate the tyre-road friction
by identifying the environmental factors affecting it. This work aims to
formulate the problem of friction estimation as a visual perceptual learning
task. The problem is broken down into detecting surface characteristics by
applying semantic segmentation and using the extracted features to predict the
frictional force. This work for the first time formulates the friction
estimation problem as a regression from the latent space of a semantic
segmentation model. The preliminary results indicate that this approach can
estimate frictional force.
( 2
min )
In this case study we trained and published a state-of-the-art open-source
model for Automatic Speech Recognition (ASR) for German to evaluate the current
potential of this technology for the use in the larger context of Digital
Humanities and cultural heritage indexation. Along with this paper we publish
our wav2vec2 based speech to text model while we evaluate its performance on a
corpus of historical recordings we assembled compared against commercial
cloud-based and proprietary services. While our model achieves moderate
results, we see that proprietary cloud services fare significantly better. As
our results show, recognition rates over 90 percent can currently be achieved,
however, these numbers drop quickly once the recordings feature limited audio
quality or use of non-every day or outworn language. A big issue is the high
variety of different dialects and accents in the German language. Nevertheless,
this paper highlights that the currently available quality of recognition is
high enough to address various use cases in the Digital Humanities. We argue
that ASR will become a key technology for the documentation and analysis of
audio-visual sources and identify an array of important questions that the DH
community and cultural heritage stakeholders will have to address in the near
future.
( 2
min )
General robotic grippers are challenging to control because of their rich
nonsmooth contact dynamics and the many sources of uncertainties due to the
environment or sensor noise. In this work, we demonstrate how to compute 6-DoF
grasp poses using simulation-based Bayesian inference through the full
stochastic forward simulation of the robot in its environment while robustly
accounting for many of the uncertainties in the system. A Riemannian manifold
optimization procedure preserving the nonlinearity of the rotation space is
used to compute the maximum a posteriori grasp pose. Simulation and physical
benchmarks show the promising high success rate of the approach.
( 2
min )
When dealing with electro or magnetoencephalography records, many supervised
prediction tasks are solved by working with covariance matrices to summarize
the signals. Learning with these matrices requires using Riemanian geometry to
account for their structure. In this paper, we propose a new method to deal
with distributions of covariance matrices and demonstrate its computational
efficiency on M/EEG multivariate time series. More specifically, we define a
Sliced-Wasserstein distance between measures of symmetric positive definite
matrices that comes with strong theoretical guarantees. Then, we take advantage
of its properties and kernel methods to apply this distance to brain-age
prediction from MEG data and compare it to state-of-the-art algorithms based on
Riemannian geometry. Finally, we show that it is an efficient surrogate to the
Wasserstein distance in domain adaptation for Brain Computer Interface
applications.
( 2
min )
An efficient deep learning model that can be implemented in real-time for
polyp detection is crucial to reducing polyp miss-rate during screening
procedures. Convolutional neural networks (CNNs) are vulnerable to small
changes in the input image. A CNN-based model may miss the same polyp appearing
in a series of consecutive frames and produce unsubtle detection output due to
changes in camera pose, lighting condition, light reflection, etc. In this
study, we attempt to tackle this problem by integrating temporal information
among neighboring frames. We propose an efficient feature concatenation method
for a CNN-based encoder-decoder model without adding complexity to the model.
The proposed method incorporates extracted feature maps of previous frames to
detect polyps in the current frame. The experimental results demonstrate that
the proposed method of feature concatenation improves the overall performance
of automatic polyp detection in videos. The following results are obtained on a
public video dataset: sensitivity 90.94\%, precision 90.53\%, and specificity
92.46%
( 2
min )
Accuracy validation of cortical thickness measurement is a difficult problem
due to the lack of ground truth data. To address this need, many methods have
been developed to synthetically induce gray matter (GM) atrophy in an MRI via
deformable registration, creating a set of images with known changes in
cortical thickness. However, these methods often cause blurring in atrophied
regions, and cannot simulate realistic atrophy within deep sulci where
cerebrospinal fluid (CSF) is obscured or absent. In this paper, we present a
solution using a self-supervised inpainting model to generate CSF in these
regions and create images with more plausible GM/CSF boundaries. Specifically,
we introduce a novel, 3D GAN model that incorporates patch-based dropout
training, edge map priors, and sinusoidal positional encoding, all of which are
established methods previously limited to 2D domains. We show that our
framework significantly improves the quality of the resulting synthetic images
and is adaptable to unseen data with fine-tuning. We also demonstrate that our
resulting dataset can be employed for accuracy validation of cortical
segmentation and thickness measurement.
( 2
min )
We provide an example of a distribution preserving source separation method,
which aims at addressing perceptual shortcomings of state-of-the-art methods.
Our approach uses unconditioned generative models of signal sources.
Reconstruction is achieved by means of mix-consistent sampling from a
distribution conditioned on a realization of a mix. The separated signals
follow their respective source distributions, which provides an advantage when
separation results are evaluated in a listening test.
( 2
min )
3D human mesh recovery from a 2D pose plays an important role in various
applications. However, it is hard for existing methods to simultaneously
capture the multiple relations during the evolution from skeleton to mesh,
including joint-joint, joint-vertex and vertex-vertex relations, which often
leads to implausible results. To address this issue, we propose a novel
solution, called GATOR, that contains an encoder of Graph-Aware Transformer
(GAT) and a decoder with Motion-Disentangled Regression (MDR) to explore these
multiple relations. Specifically, GAT combines a GCN and a graph-aware
self-attention in parallel to capture physical and hidden joint-joint
relations. Furthermore, MDR models joint-vertex and vertex-vertex interactions
to explore joint and vertex relations. Based on the clustering characteristics
of vertex offset fields, MDR regresses the vertices by composing the predicted
base motions. Extensive experiments show that GATOR achieves state-of-the-art
performance on two challenging benchmarks.
( 2
min )
Modelling dynamical systems is an integral component for understanding the
natural world. To this end, neural networks are becoming an increasingly
popular candidate owing to their ability to learn complex functions from large
amounts of data. Despite this recent progress, there has not been an adequate
discussion on the architectural regularization that neural networks offer when
learning such systems, hindering their efficient usage. In this paper, we
initiate a discussion in this direction using coordinate networks as a test
bed. We interpret dynamical systems and coordinate networks from a signal
processing lens, and show that simple coordinate networks with few layers can
be used to solve multiple problems in modelling dynamical systems, without any
explicit regularizers.
( 2
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based
on keypoints. Keypoint-based grasp detector from image input has demonstrated
promising results in the previous study, where the additional visual
information provided by color images compensates for the noisy depth
perception. However, it relies heavily on accurately predicting the location of
keypoints in the image space. In this paper, we devise a new grasp generation
network that reduces the dependency on precise keypoint estimation. Given an
RGB-D input, our network estimates both the grasp pose from keypoint detection
as well as scale towards the camera. We further re-design the keypoint output
space in order to mitigate the negative impact of keypoint prediction noise to
Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method
outperforms the baseline by a large margin, validating the efficacy of our
approach. Finally, despite trained on simple synthetic objects, our method
demonstrate sim-to-real capacity by showing competitive results in real-world
robot experiments.
( 2
min )
Despite the impressive performance of vision-based pose estimators, they
generally fail to perform well under adverse vision conditions and often don't
satisfy the privacy demands of customers. As a result, researchers have begun
to study tactile sensing systems as an alternative. However, these systems
suffer from noisy and ambiguous recordings. To tackle this problem, we propose
a novel solution for pose estimation from ambiguous pressure data. Our method
comprises a spatio-temporal vision transformer with an encoder-decoder
architecture. Detailed experiments on two popular public datasets reveal that
our model outperforms existing solutions in the area. Moreover, we observe that
increasing the number of temporal crops in the early stages of the network
positively impacts the performance while pre-training the network in a
self-supervised setting using a masked auto-encoder approach also further
improves the results.
( 2
min )
Rainfall data collected by various remote sensing instruments such as radars
or satellites has different space-time resolutions. This study aims to improve
the temporal resolution of radar rainfall products to help with more accurate
climate change modeling and studies. In this direction, we introduce a solution
based on EfficientNetV2, namely EfficientTempNet, to increase the temporal
resolution of radar-based rainfall products from 10 minutes to 5 minutes. We
tested EfficientRainNet over a dataset for the state of Iowa, US, and compared
its performance to three different baselines to show that EfficientTempNet
presents a viable option for better climate change monitoring.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
Automatic Speech Recognition (ASR) in medical contexts has the potential to
save time, cut costs, increase report accuracy, and reduce physician burnout.
However, the healthcare industry has been slower to adopt this technology, in
part due to the importance of avoiding medically-relevant transcription
mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR
metric that penalizes clinically-relevant mistakes more than others. We
demonstrate that this metric more closely aligns with clinician preferences on
medical sentences as compared to other metrics (WER, BLUE, METEOR, etc),
sometimes by wide margins. We collect a benchmark of 13 clinician preferences
on 149 realistic medical sentences called the Clinician Transcript Preference
benchmark (CTP), demonstrate that CBERTScore more closely matches what
clinicians prefer, and release the benchmark for the community to further
develop clinically-aware ASR metrics.
( 2
min )
Classical multidimensional scaling (CMDS) is a technique that aims to embed a
set of objects in a Euclidean space given their pairwise Euclidean distance
matrix. The main part of CMDS is based on double centering a squared distance
matrix and employing a truncated eigendecomposition to recover the point
coordinates. A central result in CMDS connects the squared Euclidean matrix to
a Gram matrix derived from the set of points. In this paper, we study a dual
basis approach to classical multidimensional scaling. We give an explicit
formula for the dual basis and fully characterize the spectrum of an essential
matrix in the dual basis framework. We make connections to a related problem in
metric nearness.
( 2
min )
Unfolding networks have shown promising results in the Compressed Sensing
(CS) field. Yet, the investigation of their generalization ability is still in
its infancy. In this paper, we perform generalization analysis of a
state-of-the-art ADMM-based unfolding network, which jointly learns a decoder
for CS and a sparsifying redundant analysis operator. To this end, we first
impose a structural constraint on the learnable sparsifier, which parametrizes
the network's hypothesis class. For the latter, we estimate its Rademacher
complexity. With this estimate in hand, we deliver generalization error bounds
for the examined network. Finally, the validity of our theory is assessed and
numerical comparisons to a state-of-the-art unfolding network are made, on
synthetic and real-world datasets. Our experimental results demonstrate that
our proposed framework complies with our theoretical findings and outperforms
the baseline, consistently for all datasets.
( 2
min )
In recent years, knowledge distillation has become a cornerstone of
efficiently deployed machine learning, with labs and industries using knowledge
distillation to train models that are inexpensive and resource-optimized.
Trojan attacks have contemporaneously gained significant prominence, revealing
fundamental vulnerabilities in deep learning models. Given the widespread use
of knowledge distillation, in this work we seek to exploit the unlabelled data
knowledge distillation process to embed Trojans in a student model without
introducing conspicuous behavior in the teacher. We ultimately devise a Trojan
attack that effectively reduces student accuracy, does not alter teacher
performance, and is efficiently constructible in practice.
( 2
min )
We introduce weak barycenters of a family of probability distributions, based
on the recently developed notion of optimal weak transport of mass by Gozlanet
al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical
analysis of this object and discuss its interpretation in the light of convex
ordering between probability measures. In particular, we show that, rather than
averaging the input distributions in a geometric way (as the Wasserstein
barycenter based on classic optimal transport does) weak barycenters extract
common geometric information shared by all the input distributions, encoded as
a latent random variable that underlies all of them. We also provide an
iterative algorithm to compute a weak barycenter for a finite family of input
distributions, and a stochastic algorithm that computes them for arbitrary
populations of laws. The latter approach is particularly well suited for the
streaming setting, i.e., when distributions are observed sequentially. The
notion of weak barycenter and our approaches to compute it are illustrated on
synthetic examples, validated on 2D real-world data and compared to standard
Wasserstein barycenters.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
Tensor decomposition is now being used for data analysis, information
compression, and knowledge recovery. However, the mathematical property of
tensor decomposition is not yet fully clarified because it is one of singular
learning machines. In this paper, we give the upper bound of its real log
canonical threshold (RLCT) of the tensor decomposition by using an algebraic
geometrical method and derive its Bayesian generalization error theoretically.
We also give considerations about its mathematical property through numerical
experiments.
( 2
min )
submitted by /u/actmademewannakms
[link] [comments]
( 43
min )
submitted by /u/Amazing_Painter_7692
[link] [comments]
( 44
min )
submitted by /u/fchung
[link] [comments]
( 46
min )
I put together this plain pytorch implementation of LLaMA (i just substituted the fairscale layers with the native ones and converted the weights accordingly) that can be more easily run in different environments.
The big problem with the official implementation is that in order to run the 65B version you need 8 GPUs no matter what, and to run the 30B version you need 4 and so on. In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.
vanilla-llama solves this problem. You just need to have enough memory and the model will be load in all the available GPUs.
https://github.com/galatolofederico/vanilla-llama
submitted by /u/poppear
[link] [comments]
( 43
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/oreosqueen6
[link] [comments]
( 41
min )
submitted by /u/Illustrious-Sign3015
[link] [comments]
( 41
min )
we take a closer look at Aicolumns - an online platform dedicated to artificial intelligence. Discover the latest AI tools, trends, and insights from a team of expert writers. Whether you're a seasoned AI professional or just starting out, aicolumns.com is your ultimate guide to all things AI.
https://youtu.be/927XESjV3kg
submitted by /u/Bassissou23
[link] [comments]
( 41
min )
submitted by /u/Wireless_Life
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/tottocotunio
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/MusabShakeel
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/SuspiciousPillbox
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/XiaolongWang
[link] [comments]
( 43
min )
https://github.com/jacobgil/confidenceinterval
pip install confidenceinterval
tldr: You don't have an excuse anymore to not use confidence intervals !
In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.
For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.
More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.
…
( 45
min )
submitted by /u/Simusid
[link] [comments]
( 50
min )
submitted by /u/Soft-Material3294
[link] [comments]
( 43
min )
submitted by /u/madredditscientist
[link] [comments]
( 45
min )
Decompose Python libraries and generate Coherent hierarchical topic models of the repository.
https://github.com/danielpatrickhug/GitModel
The ability to bootstrap its own codebase is a powerful feature as it allows for efficient self-improvement and expansion. It means that the codebase is designed in such a way that it can use its own output as an input to improve itself. In the context of GitModel, this feature allows for the efficient improvement and expansion of its own codebase. By using its own output to generate hierarchical topic trees of GitHub repositories, it can analyze and extract insights from its own codebase and other codebases to improve its functionality. This can lead to more efficient and effective code generation, better semantic graph generation, and improved text generation capabilities.
I spent around 10 hours today on a major refactor creating a simple pipeline abstraction and allowing dynamic instantiation from yaml configs. It now also supports multiple GNN heads.
Please try it out and let me know what you think!
Example:
https://github.com/deepmind/clrs
https://preview.redd.it/ut4fc6c401na1.png?width=1506&format=png&auto=webp&s=d757356424b933cfa039cd922e27ec85bdffe0d4
submitted by /u/NovelspaceOnly
[link] [comments]
( 48
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/merino_london16
[link] [comments]
( 42
min )
Midjourney seems to consistently have the best results. Have had very mixed results with Stable Diffusion, Lexica, and others like OpenJourney.
What model is closest to Midjourney's results but is open source &/or has an API?
submitted by /u/sideprojects_ai
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 41
min )
submitted by /u/RobotArtificial
[link] [comments]
( 41
min )
submitted by /u/barrese87
[link] [comments]
( 41
min )
submitted by /u/csansoon
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/LincolnOsiris_
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
AI Weirdness: the strange side of machine learning
( 2
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
submitted by /u/keghn
[link] [comments]
( 41
min )
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income
and engages in consumption at each time step while aiming to maximize a concave
utility subject to the underlying market conditions. The households aim to find
the optimal saving strategy that maximizes their discounted cumulative utility
given the market condition, while the firms determine the market conditions
through maximizing corporate profit based on the household population behavior.
The model captures a wide range of applications in macroeconomic studies, and
we propose a data-driven reinforcement learning framework that finds the
regularized competitive equilibrium of the model. The proposed algorithm enjoys
theoretical guarantees in converging to the equilibrium of the market at a
sub-linear rate.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
We introduce a class of networked Markov potential games where agents are
associated with nodes in a network. Each agent has its own local potential
function, and the reward of each agent depends only on the states and actions
of agents within a $\kappa$-hop neighborhood. In this context, we propose a
localized actor-critic algorithm. The algorithm is scalable since each agent
uses only local information and does not need access to the global state.
Further, the algorithm overcomes the curse of dimensionality through the use of
function approximation. Our main results provide finite-sample guarantees up to
a localization error and a function approximation error. Specifically, we
achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by
the averaged Nash regret. This is the first finite-sample bound for multi-agent
competitive games that does not depend on the number of agents.
( 2
min )
A rigorous formalization of desired system requirements is indispensable when
performing any verification task. This often limits the application of
verification techniques, as writing formal specifications is an error-prone and
time-consuming manual task. To facilitate this, we present nl2spec, a framework
for applying Large Language Models (LLMs) to derive formal specifications (in
temporal logics) from unstructured natural language. In particular, we
introduce a new methodology to detect and resolve the inherent ambiguity of
system requirements in natural language: we utilize LLMs to map subformulas of
the formalization back to the corresponding natural language fragments of the
input. Users iteratively add, delete, and edit these sub-translations to amend
erroneous formalizations, which is easier than manually redrafting the entire
formalization. The framework is agnostic to specific application domains and
can be extended to similar specification languages and new neural models. We
perform a user study to obtain a challenging dataset, which we use to run
experiments on the quality of translations. We provide an open-source
implementation, including a web-based frontend.
( 2
min )
Blackwell's approachability is a very general sequential decision framework
where a Decision Maker obtains vector-valued outcomes, and aims at the
convergence of the average outcome to a given "target" set. Blackwell gave a
sufficient condition for the decision maker having a strategy guaranteeing such
a convergence against an adversarial environment, as well as what we now call
the Blackwell's algorithm, which then ensures convergence. Blackwell's
approachability has since been applied to numerous problems, in online learning
and game theory, in particular. We extend this framework by allowing the
outcome function and the dot product to be time-dependent. We establish a
general guarantee for the natural extension to this framework of Blackwell's
algorithm. In the case where the target set is an orthant, we present a family
of time-dependent dot products which yields different convergence speeds for
each coordinate of the average outcome. We apply this framework to the Big
Match (one of the most important toy examples of stochastic games) where an
$\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's
algorithm in a well-chosen auxiliary approachability problem.
( 2
min )
Bayesian Causal Forests (BCF) is a causal inference machine learning model
based on a highly flexible non-parametric regression and classification tool
called Bayesian Additive Regression Trees (BART). Motivated by data from the
Trends in International Mathematics and Science Study (TIMSS), which includes
data on student achievement in both mathematics and science, we present a
multivariate extension of the BCF algorithm. With the help of simulation
studies we show that our approach can accurately estimate causal effects for
multiple outcomes subject to the same treatment. We also apply our model to
Irish data from TIMSS 2019. Our findings reveal the positive effects of having
access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while
also highlighting the negative consequences of students often feeling hungry at
school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI:
[-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47,
-1.55]).
( 2
min )
submitted by /u/dharambir_iitk
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Parth-Prajapati
[link] [comments]
( 43
min )
submitted by /u/catalinghita8
[link] [comments]
( 42
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/joelwohlhauser
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 41
min )
Samples can be found here and here. See how they compare to the original chorales and fugues.
The model uses a Transformer encoder architecture to complete partially corrupted sequences representations of music. A version of Gibbs sampling is then used to construct new music from scratch. The entire model was trained in under 30 minutes on a single Tesla V100 - really showcasing the efficiency of Transformers in general.
Note that the fugue samples are seeded by the first three bars of an actual Bach fugue. The chorales are generated completely from scratch!
For more information on how it works - see the GitHub repo or follow me on Twitter.
submitted by /u/ustainbolt
[link] [comments]
( 43
min )
submitted by /u/blabboy
[link] [comments]
( 43
min )
I recently delved into the world of transformers and their application to vision tasks.
As part of my learning process, I implemented the Vision Transformer (ViT) from scratch using PyTorch. I am sharing my implementation and a step-by-step guide to implementing the model in this post.
I hope you find it helpful.
Github: https://github.com/tintn/vision-transformer-from-scratch
Post: https://medium.com/towards-data-science/implementing-vision-transformer-vit-from-scratch-3e192c6155f0
submitted by /u/Tin_Ng
[link] [comments]
( 43
min )
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio. Data Wrangler enables you to access data from a wide variety of popular sources (Amazon S3, Amazon Athena, Amazon Redshift, Amazon EMR and Snowflake) and over 40 other third-party sources. […]
( 10
min )
In this two-part series, we demonstrate how to label and train models for 3D object detection tasks. In part 1, we discuss the dataset we’re using, as well as any preprocessing steps, to understand and label data. In part 2, we walk through how to train a model on your dataset and deploy it to […]
( 13
min )
Online fraud has a widespread impact on businesses and requires an effective end-to-end strategy to detect and prevent new account fraud and account takeovers, and stop suspicious payment transactions. In this post, we show a serverless approach to detect online transaction fraud in near-real time. We show how you can apply this approach to various data streaming and event-driven architectures, depending on the desired outcome and actions to take to prevent fraud (such as alert the user about the fraud or flag the transaction for additional review).
( 7
min )
Aleksander Mądry urges lawmakers to ask rigorous questions about how AI tools are being used by corporations.
( 8
min )
The computer science and philosophy double-major aims to advance the field of AI ethics.
( 9
min )
submitted by /u/Dendrophile_guy
[link] [comments]
( 41
min )
submitted by /u/webmanpt
[link] [comments]
( 41
min )
submitted by /u/_utisz_
[link] [comments]
( 41
min )
submitted by /u/A_single_french_fry
[link] [comments]
( 41
min )
submitted by /u/tomd_96
[link] [comments]
( 41
min )
submitted by /u/harttrav
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/jsonathan
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 46
min )
The AI landscape is being reshaped by the rise of generative models capable of synthesizing high-quality data, such as text, images, music, and videos. The course toward democratization of AI helped to further popularize generative AI following the open-source releases for such foundation model families as BERT, T5, GPT, CLIP and, most recently, Stable Diffusion. […]
( 9
min )
As machine learning (ML) models have improved, data scientists, ML engineers and researchers have shifted more of their attention to defining and bettering data quality. This has led to the emergence of a data-centric approach to ML and various techniques to improve model performance by focusing on data requirements. Applying these techniques allows ML practitioners […]
( 9
min )
Aided by machine learning, scientists are working to develop a vaccine that would be effective against all SARS-Cov-2 strains.
( 10
min )
It’s a thrilling GFN Thursday with GRID Legends racing to the cloud this week. It leads a total of eight new games expanding the GeForce NOW library. New content for Rainbow Six Siege is also now streaming. Plus, two new cities are now online with GeForce RTX 4080 performance for cloud gaming. Chicago and Montreal Read article >
( 6
min )
Hi, I work at Intel as an academic outreach coordinator. I'm sharing about Intel's open source OpenVINO toolkit for optimizing and deploy AI inference on CPUs, discrete and integrated GPUs, and other accelerators like Movidius VPUs and Intel FPGA. The github has over 60 jupyter notebooks that can work on Intel PCs/laptop using Windows & Linux, or on Macs on MacOS including M1 processors.
Try out the stable diffusion Jupyter Notebook #225, or try out the vehicle recognition and detection Jupyter Notebook #218
Its easy to install in 9 simple steps on Windows with pip install, 8 steps on MacOS, and 7 steps on Linux.
submitted by /u/JayMBurris
[link] [comments]
( 43
min )
submitted by /u/israelavila
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
I had to do a couple of tries but I think overall the results are impressive. Here it is:
https://www.youtube.com/watch?v=LcrLopIoJeA&t=14s&ab_channel=Triviadetodo
submitted by /u/laburanta
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Huguini
[link] [comments]
( 41
min )
submitted by /u/h_xiao
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
MIT researchers uncover the structural properties and dynamics of deep classifiers, offering novel explanations for optimization, generalization, and approximation in deep networks.
( 8
min )
Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. Sagemaker provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so […]
( 10
min )
This post is co-authored with Hernan Figueroa, Sr. Manager Data Science at Marubeni Power International. Marubeni Power International Inc (MPII) owns and invests in power business platforms in the Americas. An important vertical for MPII is asset management for renewable energy and energy storage assets, which are critical to reduce the carbon intensity of our […]
( 10
min )
Reinforcement learning (RL) encompasses a class of machine learning (ML) techniques that can be used to solve sequential decision-making problems. RL techniques have found widespread applications in numerous domains, including financial services, autonomous navigation, industrial control, and e-commerce. The objective of an RL problem is to train an agent that, given an observation from its […]
( 11
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )